How to Use FileSystemWatcher Instead of Polling

by Larry Spencer Sunday, December 30, 2012 12:07 PM

I've been prototyping a workflow application, and one of the first requirements is to watch for files that arrive in a directory so they can participate in the configured workflows. All I have to do is use the .NET Framework's FileSystemWatcher, right? Well, maybe. Some people would be quick to say that FileSystemWatcher has several nearly fatal shortcomings.

Alleged Bugs in FileSystemWatcher

  • It does not tell you about files that are already in the directory when the system starts.
  • It can fire more than one Create event when we might expect only one.
  • People say it is so unreliable that they have resorted to old-fashioned polling. Yuck!

What a piece of junk!

They Are Not Bugs

Not so fast. When handled properly, FileSystemWatcher is a very useful tool. Compared to polling, it is more responsive and uses resources more sparingly.

The first problem -- not notifying you of files that are already there -- is, of course, by design. We can judiciously combine a call to Directory.GetFiles with our FileSystemWatcher and get everything we need. The code I'll share later in this post shows how.

The multiple-Create problem is not a bug in FileSystemWatcher. It is an artifact of how some programs create files. Even humble Notepad does this. We can filter out duplicate Create events in a subsequent step.

The third problem -- missing files -- carn arise in two ways, both of which can be avoided if you know what you're dealing with.

Most commonly, "missed" files are the result using non-thread-safe code in your event-handler. FileSystemWatcher raises the Create event for each file on a new thread. If those threads are updating a non-thread-safe object like a List<T> of files, then race conditions may occur and events seem to be dropped. When testing the code I am about to show you, I have created as many as 2,000 small files using 10 simultaneous threads, and FileSystemWatcher didn't miss any of them.

You can also miss files in extremely high-volume situations. If FileSystemWatcher falls behind in setting up its threads for event handlers, its buffer of events can fill up and fail to accept new events. By default, the buffer is 4K, which is enough to hold 80 events (i.e., it can fall 80 events behind with no problem). You can increase this with the InternalBufferSize property. Better yet, you can configure the watcher so it ignores the events that don't interest you anyway. For my needs (watching for the arrival of new files), I only care about the Create event, and I have a hard time envisioning a scenario where Create events would occur so rapidly as to overwhelm the FileSystemWatcher. Change events might be more of a concern. MSDN has more to say here: Considerations for File Changes on High-Volume Systems.

How to Watch a Directory

Now that I've pled my case for FileSystemWatcher, let's see how to use it to detect files arriving in a directory, as well as files that are already there. After the code, I'll explain why I made some of the choices I did.

 

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics.Contracts;
using System.IO;
using System.Threading;

namespace Fws.Collections
{
    /// <summary>
    /// Detects the arrival of files in a directory and makes them available to a client class
    /// as an IEnumerable of fully pathed file names. Unlike the .NET FileSystemWatcher, this 
    /// class yields files that exist when the object is constructed. Also, it is not an IDisposable.
    /// </summary>
    /// <remarks>
    /// <para>
    /// If a file arrives during the execution of this class's constructor, it may be reported more than
    /// once. Also, some programs write their files in such a way that the underlying FileSystemWatcher
    /// will fire a Create event more than once. In those cases, this class will yield the
    /// file multiple times.
    /// </para><para>
    /// Client code must account for these possibilities. It is envisioned that wrapping classes may
    /// refine the yielded files by waiting for them to quiesce, filtering out duplicates, etc.
    /// </para>
    /// <para>
    /// This class is thread-safe: more than one thread may enumerate the files presented by a 
    /// single instance of this class, and each thread will get all the files.
    /// </para>
    /// </remarks>
    public sealed class CreatedFileCollection : IEnumerable<string>
    {
        #region Fields
        readonly string _directory;
        readonly string _filePattern;
        readonly CancellationToken _cancellationToken;
        #endregion

        #region Nested Class to Collect Results
        /// <summary>
        /// A queue of files found within one GetEnumerator call.
        /// </summary>
        private sealed class CreatedFileQueue : IDisposable
        {
            readonly ConcurrentQueue<string> _queue = new ConcurrentQueue<string>();
            readonly SemaphoreSlim _fileEnqueued = new SemaphoreSlim(0);

            /// <summary>
            /// Attempt to get a file from the queue.
            /// </summary>
            /// <param name="fileName">The name of the file, if one is immediately available.</param>
            /// <returns>True if got a file; false if not.</returns>
            public bool TryDequeue(out string fileName, CancellationToken cancellationToken)
            {
                fileName = null;
                // Avoid the OperationCanceledException if we can.
                if (cancellationToken.IsCancellationRequested)
                    return false;
                try
                {
                    _fileEnqueued.Wait(cancellationToken);
                    return _queue.TryDequeue(out fileName);
                }
                catch (OperationCanceledException)
                {
                    return false;
                }
            }

            /// <summary>
            /// Handles the Created event of the enclosing class's FileSystemWatcher.
            /// </summary>
            /// <param name="sender">This object.</param>
            /// <param name="e">Args for the new file.</param>
            public void FileCreated(object sender, FileSystemEventArgs e)
            {
                _queue.Enqueue(e.FullPath);
                _fileEnqueued.Release();
            }

            public void Dispose()
            {
                _fileEnqueued.Dispose();
            }
        }
        #endregion

        #region Constructor
        /// <summary>
        /// Constructor. 
        /// </summary>
        /// <param name="cancellationToken">This class will terminate the enumeration of
        /// files when and only when the token enters the canceled state.</param>
        /// <param name="directory">The directory to watch.</param>
        /// <param name="filePattern">A pattern to match in the file name. Example: "*.txt".
        /// Null means all files.</param>
        /// <remarks>Duplicates may be returned on the queue. See remarks for the class.</remarks>
        public CreatedFileCollection(CancellationToken cancellationToken, string directory, string filePattern=null)
        {
            Contract.Requires(directory != null);
            Contract.Requires(cancellationToken != null);

            if (!Directory.Exists(directory))
                throw new ArgumentException(String.Format("Directory '{0}' does not exist.", directory));

            _directory = directory;
            _filePattern = filePattern ?? "*";
            _cancellationToken = cancellationToken;
        }
        #endregion

        #region Methods
        /// <summary>
        /// Get an enumerator that will yield files until the CanellationToken is canceled.
        /// </summary>
        /// <returns>Fully pathed file names.</returns>
        /// <remarks>
        /// It is possible for a file name to be returned from more than once.
        /// </remarks>
        public IEnumerator<string> GetEnumerator()
        {
            if (!_cancellationToken.IsCancellationRequested)
            {
                using (var watcher = new FileSystemWatcher(_directory, _filePattern))
                {
                    using (var queue = new CreatedFileQueue())
                    {
                        // Restrict the NotifyFilter to all that's necessary for Create events.
                        // This minimizes the likelihood that FileSystemWatcher's buffer will be overwhelmed.
                        watcher.NotifyFilter = NotifyFilters.FileName;

                        watcher.Created += queue.FileCreated;

                        watcher.EnableRaisingEvents = true;
                        // Note that if a file arrives during the following loop, it will be placed on the queue
                        // twice: once when the Create event is raised, and once by the loop itself.
                        foreach (var file in Directory.GetFiles(_directory, _filePattern, SearchOption.TopDirectoryOnly))
                        {
                            queue.FileCreated(this, new FileSystemEventArgs(WatcherChangeTypes.Created, _directory, Path.GetFileName(file)));
                        }

                        if (!_cancellationToken.IsCancellationRequested)
                        {
                            string fileName;
                            while (queue.TryDequeue(out fileName, _cancellationToken))
                                yield return fileName;
                        }
                    }
                }
            }
        }

        /// <summary>
        /// Required method for IEnumerable.
        /// </summary>
        /// <returns>The generic enumerator, but as a non-generic version.</returns>
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
        #endregion
    }
}

 

CreatedFileCollection.cs

That was probably more complicated -- I mean, engineered -- than you expected. Why didn't I just inherit from FileSystemWatcher and be done with it? Let's look at some of the features and why they're there.

Inheriting from IEnumerable<string>

I've become an big fan of IEnumerable<T>. It is the universal medium of exchange between so many classes in the .NET Framework, especially the newest and best like the LINQ extension methods and the Parallel class. Because CreatedFileCollection inherits from IEnumerable<string>, it can automatically participate in all that.

One of the expectations of an IEnumerable<T> implementation is that it doesn't get all its data immediately. It defers the fetch of an object until the request for it. That exactly fits what we're trying to do with the CreatedFileCollection. We certainly don't want to collect all the files up-front. Furthermore, when we ask for the next one we want to sit there until it arrives -- also in the spirit of IEnumerable<T>.

Cancellable

We want to wait for the next file to arrive, but we don't necessarily want to wait forever. The CancellationToken, introduced with .NET 4, is the right tool for this situation. You can read all about it here: Cancellation in Managed Threads. As that MSDN article says, CancellationTokens and their partner class, CancellationTokenSource, present a "unified model for cooperative cancellation of asynchronous or long-running synchronous operations."

What's nice about this cancellation mechanism is that more than one thread can observe the same CancellationToken. As soon as one thread signals cancellation, they all know about it.

In CreatedFileCollection, we observe the CancellationToken in the loop that waits for files (line 145 and in the CreatedQueue class on line 43). If the IsCancellationRequested property turns true, we bail out. More on this later.

Not an IDisposable

I have almost gotten to the point where I think an IDisposable is code smell. How are we supposed to do true interface- and dependency-injection-based programming if the "real" injected object is an IDisposable in addition to implementing whatever interface we're injecting, but in unit testing the mock implementation is probably not? You'll see the wisdom of this in the next post, when we chain the IEnumerable<T>-based CreatedFileCollection to another stage in the IEnumerable<T> pipe. Because CreatedFileCollection does not inherit from IDisposable, I can easily inject any other IEnumerable<T> into the next class in the chain for unit-testing purposes.

A naive implementation of CreatedFileCollection might have inherited from FileSystemWatcher instead of from IEnumerable<T>, but that would have made CreatedFileCollection an IDisposable, not to mention making it more difficult to integrate with other classes.

Instead, the construction and disposal of the FileSystemWatcher are buried in the GetEnumerator method, to which we turn next.

GetEnumerator Does Everything

The Single Responsibility of this class is to enumerate files in a directory, present and future. The GetEnumerator method (line 120) is therefore all we need.

It begins by making sure that cancellation has not already been requested. If so, there's no point chewing up cycles and we can quit before we start.

It then creates the FileSystemWatcher in a using block to ensure it's disposed properly.

The rationale for next part, where it creates an instance of the nested CreatedFileQueue class (line 126), may not be obvious. The CreatedFileQueue is a first-in-first-out queue of files into which files are added as they arrive, and from which files are dequeued as we request them. The first question you might ask is, "Why not make the queue a simple member variable of CreatedFileCollection?" The answer is that there might be more than one thread using the CreatedFileCollection, each peeling off files with its own call to GetEnumerator (that is, with its own foreach loop). We would want each thread to get a complete set of files, so each one needs its own queue. A single queue in a member variable, in the class instance both threads share, would not accomplish this.

The first files that are added to the queue are the ones that are already present, using a call to Directory.GetFiles (line 137).

Before that happens, however, we connect the FileSystemWatcher's Created event to the FileCreated method of our CreatedFileQueue instance. It is possible that a file could arrive through both means: through FileCreated and then, a moment later, in the directory listing. That's OK! Our class does not guarantee that a file will only be reported once in the enumeration. As I mentioned, we will take care of that down the IEnumerable<T> chain, in a class whose Single Responsibility is to eliminate duplicates.

After the Created event is hoooked up and the initial files are reported, we just keep trying to peel files off the queue (line 144):

 

string fileName;
while (queue.TryDequeue(out fileName, _cancellationToken))
    yield return fileName;

 

Notice how the CancellationToken is passed into the queue processing. Let's turn now to that nested CreatedFileQueue class.

CreatedFileQueue

The CreatedFileQueue class has two main methods that cooperate through a semaphore. We've already seen that the FileSystemWatcher will call FileCreated (line 75) whenever a file arrives in the directory. That method enqueues the file in encapsulated ConcurrentQueue and then increments the semaphore.

 

public void FileCreated(object sender, FileSystemEventArgs e)
{
    _queue.Enqueue(e.FullPath);
    _fileEnqueued.Release();
}

 

Meanwhile, the TryDequeue method (line 53) waits for the semaphore to show that at least one file is enqueued before dequeueing it and returning it to the loop in CreatedFilesCollection.GetEnumerator.

public bool TryDequeue(out string fileName, CancellationToken cancellationToken)
{
    fileName = null;
    // Avoid the OperationCanceledException if we can.
    if (cancellationToken.IsCancellationRequested)
        return false;
    try
    {
        _fileEnqueued.Wait(cancellationToken);
        return _queue.TryDequeue(out fileName);
    }
    catch (OperationCanceledException)
    {
        return false;
    }
}

 

An alternative to the semaphore would have been to do some sort of wait-and-loop process in TryDequeue until either a file was on the queue or cancellation was requested. But why do that? A semaphore is much more efficient, not to mention elegant.

Also, a sempahore, even a "slim" one, is able to wait for both the semaphore signal and a CancellationToken! (Remember what MSDN said about CancellationTokens being part of a unified model?)

Conclusion

So there you have it: a file-system watcher that has all the desired qualities.

  • It presents its results in the simplest manner possible: an IEnumerable of file names that you can plug right into a foreach loop.
  • It is thread-safe.
  • It is easily cancellable, using .NET 4's standard cancellation model.
  • It is not an IDisposable, and can therefore be easily mocked.

In the next post, we'll handle the next consideration: eliminating duplicate files. As with CreatedFileQueue, some sound engineering will make things simple for consumers of the class.

Tags: ,

All | General

Pingbacks and trackbacks (1)+

Add comment