0

In short, I'm trying to "sort" incoming results of using threadpooling as they finish. I have a functional solution, but there's no way in the world it's the best way to do it (it's prone to huge pauses). So here I am! I'll try to hit the bullet points of what's going on/what needs to happen and then post my current solution.

  • The intent of the code is to get information about files in a directory, and then write that to a text file.

  • I have a list (Counter.ListOfFiles) that is a list of the file paths sorted in a particular way. This is the guide that dictates the order I need to write to the text file.

  • I'm using a threadpool to collect the information about each file, create a stringbuilder with all of the text ready to write to the text file. I then call a procedure(SyncUpdate, inlcluded below), send the stringbuilder(strBld) from that thread along with the name of the path of the file that particular thread just wrote to the stringbuilder about(Xref).

  • The procedure includes a synclock to hold all the other threads until it finds a thread passing the correct information. That "correct" information being when the xref passed by the thread matches the first item in my list (FirstListItem). When that happens, I write to the text file, delete the first item in the list and do it again with the next thread.

The way I'm using the monitor is probably not great, in fact I have little doubt I'm using it in an offensively wanton manner. Basically while the xref (from the thread) <> the first item in my list, I'm doing a pulseall for the monitor. I originally was using monitor.wait, but it would eventually just give up trying to sort through the list, even when using a pulse elsewhere. I may have just been coding something awkwardly. Either way, I don't think it's going to change anything.

Basically the problem comes down to the fact that the monitor will pulse through all of the items it has in the queue, when there's a good chance the item I am looking for probably got passed to it somewhere earlier in the queue or whatever and it's now going to sort through all of the items again before looping back around to find a criteria that matches. The result of this is that my code will hit one of these and take a huge amount of time to complete.

I'm open to believing I'm just using the wrong tool for the job, or just not using tool I have correctly. I would strongly prefer some sort of threaded solution (unsurprisingly, it's much faster!). I've been messing around a bit with the Parallel Task functionality today, and a lot of the stuff looks promising, but I have even less experience with that vs. threadpool, and you can see how I'm abusing that! Maybe something with queue? You get the idea. I am directionless. Anything someone could suggest would be much appreciated. Thanks! Let me know if you need any additional information.

  Private Sub SyncUpdateResource(strBld As Object, Xref As String)
    SyncLock (CType(strBld, StringBuilder))
        Dim FirstListitem As String = counter.ListOfFiles.First
        Do While Xref <> FirstListitem
            FirstListitem = Counter.ListOfFiles.First
    'This makes the code much faster for reasons I can only guess at.
            Thread.Sleep(5)
            Monitor.PulseAll(CType(strBld, StringBuilder))
        Loop
        Dim strVol As String = Form1.Volname
        Dim strLFPPath As String = Form1.txtPathDir
        My.Computer.FileSystem.WriteAllText(strLFPPath & "\" & strVol & ".txt", strBld.ToString, True)
        Counter.ListOfFiles.Remove(Xref)
    End SyncLock
End Sub

1 Answer 1

3

This is a pretty typical multiple producer, single consumer application. The only wrinkle is that you have to order the results before they're written to the output. That's not difficult to do. So let's let that requirement slide for a moment.

The easiest way in .NET to implement a producer/consumer relationship is with BlockingCollection, which is a thread-safe FIFO queue. Basically, you do this:

In your case, the producer threads get items, do whatever processing they need to, and then put the item onto the queue. There's no need for any explicit synchronization--the BlockingCollection class implementation does that for you.

Your consumer pulls things from the queue and outputs them. You can see a really simple example of this in my article Simple Multithreading, Part 2. (Scroll down to the third example if you're just interested in the code.) That example just uses one producer and one consumer, but you can have N producers if you want.

Your requirements have a little wrinkle in that the consumer can't just write items to the file as it gets them. It has to make sure that it's writing them in the proper order. As I said, that's not too difficult to do.

What you want is a priority queue of some sort onto which you can place an item if it comes in out of order. Your priority queue can be a sorted list or even just a sequential list if the number of items you expect to get out of order isn't very large. That is, if you typically have only a half dozen items at a time that could be out of order, then a sequential list could work just fine.

I'd use a heap because it performs well. The .NET Framework doesn't supply a heap, but I have a simple one that works well for jobs like this. See A Generic BinaryHeap Class.

So here's how I'd write the consumer (the code is in pseudo-C#, but you can probably convert it easily enough).

The assumption here is that you have a BlockingCollection called sharedQueue that contains the items. The producers put items on that queue. Consumers do this:

var heap = new BinaryHeap<ItemType>();
foreach (var item in sharedQueue.GetConsumingEnumerable())
{
    if (item.SequenceKey == expectedSequenceKey)
    {
        // output this item
        // then check the heap to see if other items need to be output
        expectedSequenceKey = expectedSequenceKey + 1;
        while (heap.Count > 0 && heap.Peek().SequenceKey == expectedSequenceKey)
        {
            var heapItem = heap.RemoveRoot();
            // output heapItem
            expectedSequenceKey = expectedSequenceKey + 1;
        }
    }
    else
    {
        // item is out of order
        // put it on the heap
        heap.Insert(item);
    }
}
// if the heap contains items after everything is processed,
// then some error occurred.

One glaring problem with this approach as written is that the heap could grow without bound if one of your consumers crashes or goes into an infinite loop. But then, your other approach probably would suffer from that as well. If you think that's an issue, you'll have to add some way to skip an item that you think won't ever be forthcoming. Or kill the program. Or something.

If you don't have a binary heap or don't want to use one, you can do the same thing with a SortedList<ItemType>. SortedList will be faster than List, but slower than BinaryHeap if the number of items in the list is even moderately large (a couple dozen). Fewer than that and it's probably a wash.

I know that's a lot of info. I'm happy to answer any questions you might have.

Sign up to request clarification or add additional context in comments.

3 Comments

Exceedingly helpful. I'll gladly take too much information any day. Thanks! I might end up having some sort of followup.
I've definitely had a bit of a challenge converting your Binary Heap code into vb.net. I'm still rather new to vb.net, so attempting to interpret something from C# is quite a challenge. Any tips?
@Finch042: No tips, really. I'm not terribly fluent in VB. I can muddle my way through. Best bet is, if you run into a stumbling block, post question on SO with the relevant code snippet and your attempt at converting it. It's likely somebody can help you out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.