There are countless situations where you want to watch a directory and process files as they arrive -- but only once they're "ready". In situations I've encountered, "ready" means that no other application has it open exclusively, and it can also mean that the file has not been written to for a certain period of time.
Broadly speaking, there are two ways to get these files. You can periodically check the directory yourself (polling), or you can ask the operating system to inform you as files arrive.
Polling
The polling approach is simpler, but it has two three drawbacks.
- It is wasteful. In many applications, the majority of polling events will turn up nothing new. You will have made a call for nothing.
- It is less responsive. If you correct for wastefulness by increasing the polling interval, you then must wait longer before learning that a new file has arrived. Even for a background or batch process, this can be a problem. One company I know uses a polling approach in an overnight batch process, and they just barely get through all the files that they must process each night.
- It is less fun. What's the point of being a software developer if you're not having fun?
Notifications from the Operating System (FileSystemWatcher)
A FileSystemWatcher can notify you right away when a new file arrives. However, to use it properly requires some engineering. The good news is that once you've built a sound component based on FileSystemWatcher, you can reuse it forever.
The last few posts have described one way to meet the engineering challenges. We have seen a chain of IEnumerable<T>-based classes that started by wrapping a FileSystemWatcher, and progressively refined the results until, with the present post, we will finally have an IEnumerable<string> of "ready" files. The class at each stage takes an IEnumerable<T> in its constructor and inherits from IEnumerable<T> so an instance of it can be fed to the next constructor in the chain.
- In How to Use FileSystemWatcher Instead of Polling, we obtained an IEnumerable<string> of file names that begins with files already in the directory of interest, and continues to yield files as a wrapped FileSystemWatcher brings them to our attention. All this was encapsulated in the CreatedFileCollection class.
- FileSystemWatcher (through no fault of its own) will sometimes notify you more than once that a file has been created. How to Filter Long-Running IEnumerables for Distinct Objects showed how to handle this. The result was the TimedDistinctCollection.
- The last post, How to Apply a Gate to an IEnumerable, was about a class that will yield an input IEnumerable<T> only as its items become ready. This was the ReadyItemCollection.
- In this post, we'll derive from ReadyItemCollection for the special case where the items are files. The new class is ReadyFileCollection.
Thus, we have a pipeline of IEnumerable<string>s:
CreatedFileCollection
--> TimedDistinctCollection
--> ReadyFileCollection (derived from ReadyItemCollection).
ReadyFileCollection.cs
As I said, ReadyFileCollection derives from ReadyItemCollection<string>. The generic string parameter is a file name. All ReadyFileCollection must do is say what it means for a file to be "ready", and how long we want to wait before re-checking a file that is not yet ready.
The comments in the source should be sufficient to explain it from there.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace Fws.Collections
{
/// <summary>
/// Construct this class on an IEnumerable of file names. Its GetEnumerator method
/// will then yield the files as they become ready, meaning they have not been
/// written to for a given period of time, and are likely to be openable in
/// the given mode.
/// </summary>
/// <remarks>
/// Beware! It is possible that a file could become un-ready after this IEnumerable
/// yields it, but before you try to open it. Always check for Exceptions when you
/// open a file, even one that is supposedly 'ready'.
/// </remarks>
public class ReadyFileCollection : ReadyItemCollection<string>, IEnumerable<string>
{
static DateTime _lastWriteTimeUtcForFileThatDoesNotExist
= File.GetLastWriteTimeUtc(Guid.NewGuid().ToString());
/// <summary>
/// Constructor.
/// </summary>
/// <param name="inputFiles">
/// An IEnumerable of file names.</param>
/// <param name="cancellationToken">
/// The output enumeration will terminate if this token goes into the canceled state.</param>
/// <param name="quiesceTime">
/// For a file to be considered ready, it must not have been written to for this long.</param>
/// <param name="fileAccess">
/// For a file to be considered ready, it must be openable with this access method.</param>
/// <param name="fileShare">
/// For a file to be considered ready, it must be openable with this sharing parameter.</param>
public ReadyFileCollection(
IEnumerable<string> inputFiles,
CancellationToken cancellationToken,
TimeSpan quiesceTime = default(TimeSpan),
FileAccess fileAccess = FileAccess.Read,
FileShare fileShare = FileShare.None )
:base(
inputFiles,
cancellationToken,
file => FileIsReady(file, quiesceTime, fileAccess, fileShare),
file => GetNextCheckTime(quiesceTime))
{
}
/// <summary>
/// Tell whether the file is ready.
/// </summary>
/// <param name="fileName">The name of a file.</param>
/// <returns>True if the file is ready; false if not;
/// null if the file is no longer present or access is denied.</returns>
/// <remarks>It is possible that another process will open the file after you have
/// determined that it is available with this method, but before you have opened it.
/// Always catch Exceptions when you try to open a file, even one that is supposedly
/// 'ready'.</remarks>
static internal bool? FileIsReady(
string fileName, TimeSpan quiesceTime, FileAccess fileAccess, FileShare fileShare)
{
try
{
var lastWriteTime = File.GetLastWriteTimeUtc(fileName);
if (lastWriteTime.Add(quiesceTime) >= DateTime.UtcNow)
return false;
// Surprisingly, File.GetLastWriteTimeUtc does not throw if the file does not
// exist. To handle that situation, we compare to a known value that for
// the condition.
if (lastWriteTime == _lastWriteTimeUtcForFileThatDoesNotExist)
return null;
}
catch (UnauthorizedAccessException)
{
return null; // Not allowed to see this one.
}
catch (FileNotFoundException)
{
return null; // Nothing to see.
}
// Other problems are more serious and we want to let the Exception bubble up.
try
{
using (File.Open(fileName, FileMode.Open, fileAccess, fileShare))
{
return true;
}
}
catch
{
return false;
}
}
/// <summary>
/// Tell when we want to check on this file next.
/// </summary>
/// <param name="fileName"></param>
/// <param name="waitTime"></param>
/// <returns></returns>
static internal DateTime GetNextCheckTime(TimeSpan waitTime)
{
return DateTime.UtcNow.Add(waitTime);
}
}
}
Tomorrow, I'm going to give a short talk based on these ideas. I'll incorporate any feedback I get and then post the complete Visual Studio 2012 solution for you to download. [Done! See the next post, How to Use FileSystemWatcher Instead of Polling - Source Code]