3

I am trying to find an efficient way to sort an array of strings based on a numeric value within each string element of the array. I am currently using the Array.Sort(array, customComparer) static method (quick sort), with my custom comparer class (sorting in descending order) being:

class StringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        string s1 = a;
        string s2 = b;

        Match matchA = Regex.Match(s1, @"\d+$");
        Match matchB = Regex.Match(s2, @"\d+$");

        long numberA = long.Parse(matchA.Value);
        long numberB = long.Parse(matchB.Value);

        if (numberB - numberA < 0)
        {
            return -1;
        }
        else 
        {
            return 1;
        }
    }
}

This works very well, but sometimes it takes too much time to sort, with an array of 100 000 strings taking more than a minute on a 2.4Ghz processor. I wonder if there is a more efficient way to accomplish the same. For example, implementing a different sorting algorithm or taking another approach like using a dictionary and sorting on the value (the value being the numeric part of the string). Any suggestions? Thanks in advance!

3
  • 1
    "using a dictionary and sorting on the value" sounds very promising. Have you tried it? Commented Feb 12, 2012 at 18:04
  • you could try Radix Sort, which targets this kind of sorts: "sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value" Commented Feb 12, 2012 at 18:22
  • 1
    @Luiggi Mendoza: The sorting algorithm is not the bottleneck. Commented Feb 12, 2012 at 19:04

4 Answers 4

5

You're parsing the value for each comparison. I would suggest you parse once, to get a string/long pair, sort that, and then extract the string part afterwards.

Note that your existing code has a bug: it will never return 0, for two strings comparing as equal.

Here's an alternative approach using LINQ (which isn't in-place sorting, but is simple.)

var sorted = unsorted.OrderBy(x => long.Parse(Regex.Match(x, @"\d+$").Value));
                     .ToList();

(OrderBy projects once to get the keys, then compares keys.)

Sign up to request clarification or add additional context in comments.

5 Comments

LINQ sounds good and I will test it. However, I am targeting .NET 2.0. Thanks!
@cake: You can use LINQBridge or something similar to use LINQ to Objects in .NET 2.0. (In future, if you're going to require a very old version of the framework, it would be helpful to say so in the question.) You can basically do the same thing though: work out all the keys once, and then sort with those.
Sorry about forgetting to mention the required version. I opted for a custom string class which compares really fast as suggested by Jason. I was not aware of LinqBridge but I will definitely look into it and compare. Thanks!
just tried LinqBridge and it worked good but there is still too much parsing and regex so the code is still slow. Thanks for mentioning LinqBridge, I used it somewhere else.
@cake: Well that's only going to be doing the parsing and regex once per value... so it shouldn't be any slower than the accepted answer.
3

You are now performing the Regexes O(n log n) times.

Consider looping once over all strings, extracting the numerical value and adding it to a SortedDictionary<long, string>

This requires only O(n) executions of the Reg expression. The rest of the sorting should be comparable.

Comments

2

First, you're needlessly parsing the same string over and over (both matching with the regular expression and then parsing the matches). Instead, encapsulate what you have into a custom type so that you only have to parse once.

public class FooString {
    private readonly string foo;
    private readonly long bar;

    public FooString(string foo) {
        this.foo = foo;
        Match match = Regex.Match(foo, @"\d+$");
        this.bar = Int64.Parse(match.Value);
    }

    public string Foo { get { return this.foo; } }
    public long Bar { get { return this.bar; } }
}

I'd even add a Contract.Requires to this class that says that foo must satisfy the regular expression.

Second, you have an IComparer<T> that dies on certain values of T (in your case, strings that don't match the regular expression and can't be parsed to a long). This is generally a bad idea.

So, make the comparer for FooString:

public FooStringComparer : IComparer<FooString> {
    public int Compare(FooString a, FooString b) {
        Contract.Requires(a != null);
        Contract.Requires(b != null);
        return a.Bar.CompareTo(b.Bar);
    }
}

Now, your sorting will be blazingly fast because you've stopped parsing the same string over and over.

1 Comment

Your approach is great! I tested with 100 000 elements and the sorting process took just a few seconds! This used to take more than a minute before! The only part that takes time now is initiating the array of CustomStrings which I do with 'CustomString[] arrayOfCustomStrings = new CustomString[matchCollection.Count];' this takes about 12 seconds to create the array but after that it gets sorted in a matter of a second or two. Just for the record, I had to change 'this.bar = Int64.Parse(match)' to 'this.bar = Int64.Parse(match.value);' This was very helpful. Thanks Jason!
1

Create the Regex only once with the Compiled option. This will increase the speed.

class StringComparer : IComparer<string>
{
    private static Regex _regex = new Regex(@"\d+$", RegexOptions.Compiled);

    public int Compare(string a, string b)
    {
        long numberA = Int64.Parse(_regex.Match(a).Value);
        long numberB = Int64.Parse(_regex.Match(b).Value);
        return numberA.CompareTo(numberB);
    }
}

4 Comments

By default, Regex expressions are compiled and cached. This should not make any difference.
@HenkHolterman: The OP uses Regex.Match(s1, @"\d+$"); which will not be compiled. However, you are right by pointing out the regex will be compiled by default, when the pattern is passed in the constructor.
Ok, I see. I did not know that the static methods compile and cache the expressions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.