3

I have a huge (~100,000) collection of objects over which I have no control (let's call this masterList). They are simple with several fields

public class TheirObject{
public String GUID;
public int blah1;
public string blah2;
...
}

I have another collection of tens of thousands of GUIDs (as a List of Strings) I need to, for every GUID in my list, create a sublist of TheirObjects which contain whichever TheirObjects in the masterList have the same GUIDs.

Here is some simple code That does this:

 List<String> GUIDs;
 List<TheirObject> masterList;
 List<TheirObject> filteredList;
 foreach(String GUID in GUIDs)
 {
      filteredList = new List<TheirObject>();
      foreach(TheirObject tho in masterList)
           if(tho.GUID == GUID)
                filteredList.Add(tho);
      //do stuff with filteredList
 }

However, this takes hours! I am sure that there is a much faster way to do it, perhaphs involving sorted lists, then binary search lookups, but I can't figure out how to do it in c#. Several TheirObjects will have the same GUID in the masterList, so I don't think I can use SortedList. Help!

6
  • 2
    Don't know if this is a possibility for you, but this is exactly the thing that databases do well. Commented Sep 24, 2014 at 19:03
  • could you not do or use List<T>.foreach() and do it as a delegate or use a lambda ? Commented Sep 24, 2014 at 19:05
  • 2
    @DJKRAZE That would improve nothing at all about the program. It'd add a tiny bit of overhead without making any real functional change. Commented Sep 24, 2014 at 19:09
  • Then I would suggest that he do a Lookup or even easier use a Database Query.. Commented Sep 24, 2014 at 19:10
  • I understand that using a database would be ideal but I cannot use one in this situation (restricted toolset on this work system) Commented Sep 24, 2014 at 19:50

1 Answer 1

12

The straight forward code approach with LINQ would be something like:

var lookup = masterList.ToLookup(tho => tho.GUID);
// Now you have a hash-table based lookup containing the lists of TheirObject grouped by GUID
foreach(string GUID in GUIDs)
{
    filteredList = lookup[GUID].ToList();
    // Do your stuff with filteredList
}

The key here is not iterating the huge list multiple times, which is what kills performance. Instead, iterate through it once and build an efficient lookup. This initial build will take some time, subsequent lookups will take almost no time and (close to) O(1).

Now if the list is really huge and memory constraints does not allow you to build a datastructure more suited for lookups, I would probably try to offload the work to a database, as suggested in the comments.

Sign up to request clarification or add additional context in comments.

3 Comments

This looks great, going to try it!
WOW! this code was done in ~1min. What an improvement!
love this solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.