1

I'm trying to sort a string List by an order defined in another array. I know it's possible in a variety of ways, but I'm not sure how to do it efficiently. I need this to be able to handle a large unsorted list, with thousands of items. Here's what I came up with:

List<string> sortStringListByArray(List<string> unsortedList, string[] order)
{
     List<string> sortedList = new List<string>();
     for(int i = 0; i < order.Length; i++)
     {
          foreach(string s in unsortedList)
          {
              if(s.Equals(order[i]))
              {
                  sortedList.Add(s);
              }
          }
     }
     return sortedList;
}

It works as expected, but it's definitely not efficient. Is there any way I can do this without iterating across both the list and the order?

Edit: Clarification

Thanks!

3
  • How do you define "efficient"? What type of measurement you refer to? Time? Lines of code? Memory? Furthermore are you sure that code will affect that measurement significantly? In other words: are you sure you gain much when optimizing the code above? Beware: premature optimzation is the root of all evil. Commented Jun 30, 2019 at 18:35
  • I did mean efficiency by time, though now that you say it, I'm honestly not sure if it's necessary to optimize time-wise. This is intended to be run on lists of fairly large size (10000+), so I was thinking it would be important to do it efficiently. Commented Jun 30, 2019 at 18:39
  • Just measure before doing any unnneccessary optimization, as they will make your program more complex and thus harder to maintain. Just use a StopWatch and measure how long it takes to run that code. If it´s only a few nano-seconds, why bother for it? Commented Jun 30, 2019 at 18:56

4 Answers 4

3

The simplest way to represent it is with right inner join :

return order.Join(unsortedList, a => a, b => b, (a, b) => b).ToList();

The best time complexity is O(n+m) using Lookup or Dictionary :

var lookup = unsortedList.ToLookup(x => x);

return order.SelectMany(x => lookup[x]).ToList();

The above can be few times faster by using Dictionary<string, int> to get the counts of the items in unsortedList, and then looping over order to generate the result based on the corresponding values in the counts Dictionary.


Lookup and Dictionary use hash table to store values. To find an item in a hash table, a hash value is calculated from the value, which is similar to estimated location/index of where the value is in the hash table. This allows for only 1 or few comparisons needed to find (or not) a value in a hash table. So, O(n) to generate the Lookup or Dictionary from unsortedList, and because hash table has average O(1) lookup time, only O(m) time needed to generate the result using Lookup or Dictionary, resulting in total O(n+m) time complexity.

Sign up to request clarification or add additional context in comments.

3 Comments

Great! Thank you! Would you mind explaining why a Lookup is better? I haven't learned how to use them.
somewhat, I understand that it reduces complexity by mapping values, but I don't know much more than that.
The ToLookup basically builds a dictionary of lists of strings, keyed on the unsorted strings, grouping all duplicates in individual entries of the dictionary.
0

Building on @Ashkan's answer, you can do order.Distinct().ToList(), which removes duplicates. Since order is already sorted, you can just process then return it.

Comments

0

Considering your comments, you can simply sort your list by the index of it in the order array:

 List<string> sortedList = unsortedList.OrderBy(x => Array.IndexOf(order, x));

5 Comments

This doesn't account for duplicates in the list, which is what I need. I should've clarified that, my bad.
@MattWaterman you mean you want to keep order in order array but eliminate the duplicates?
No, I want to sort the List in a specific order, defined in the order array. For example, the order could be {"c","d","b","a"} and the list of {a,a,b,b,c,c,d,d} would become {c,c,d,d,b,b,a,a}
Hmm, this would still have O(kn) time complexity where k is the size of order, and n the size of the unsorted strings, doesn't it? Or even worse, O(knlogn).
yes, as @RobertBaron mentioned this will have worse time complexity than the O(n*m) that OP has
0

You could use an efficient sorting algorithm using the array index as the ordering input. This would be more efficient than your example solution.

E.g.,

List<string> sortStringListByArray(List<string> unsortedList, string[] order)
{
    var orders = new Dictionary<string, int>();

    for (var i = 0; i < order.Length; i++)
        orders[order[i]] = i;

     return unsortedList
         .OrderBy(s => orders[s])
         .ToList();
}

2 Comments

Same solution as @Ashkan Mobayen Khiabani with time complexity worse than original problem.
Not anymore. Updated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.