0

I need a function that converts an array of strings into a string that is sortable in the same order as if you would sort the inputs (sort first input argument, if equal sort second etc...)

In native code, separating the strings with \0 would do, but somehow

("a" + char.MinValue + "2").CompareTo("a1") equals to 1!

What is going on and is it possible to create such a function?

public static string StringsToKey(params string[] values)

EDIT: This is the test I want to succeed:

Assert.IsTrue(MiscUtils.StringsToKey("a", "2").CompareTo(MiscUtils.StringsToKey("a1")) < 0);

I would like to avoid using CompareOrdinal because I'm not always in control of how the key would be sorted. Also, ordinal might yield incorrect sorting order on international sets...

2
  • 5
    post some sample inputs and the desired output Commented Feb 10, 2011 at 22:23
  • What I got from this discussion so far, is that there is no "clean" way to produce a requested string. At the same time, if your strings come from user input, and the correct sort order is not 100% critical, \t would be the best approach. Commented Feb 12, 2011 at 5:36

5 Answers 5

5
public static string StringsToKey(params string[] values)
{
    return string.Join(char.MinValue.ToString(), values);
}

usage should work:

var s1 = StringsToKey("abc", "def", "1234");
var s2 = StringsToKey("ab", "cde", "f1234");

var comparisonResult = string.Compare(s1, s2, StringComparison.Ordinal);

EDIT:

var s1 = StringsToKey("a", "2");
var s2 = StringsToKey("a1");

var r = string.Compare(s1, s2, StringComparison.Ordinal);

returns a negative value(-49).

Edit II:

as Joe mentioned in comments, this won't work if the string contains '\0' (or '\t' if tab used as delimiter). So, there is no such a delimiter which works in all cases. So, I rewrote the function, but now it has 2 parameters but still uses CompareTo method for comparison:

public static int CompareStringSequences(
    IEnumerable<string> first, 
    IEnumerable<string> second)
{
    var x = Enumerable.Zip(first, second, (s1, s2) => s1.CompareTo(s2))
        .FirstOrDefault(i => i != 0);

    return x == 0 ?
               x1.Length - x2.Length :
               x;
}
Sign up to request clarification or add additional context in comments.

5 Comments

How about if not using ordinal? Possible?
@Sergey: as you mentioned, the tab '\t' symbol may be used as separator instead of char.MinValue even without using ordinal(but this works for strings without tab char only).
@Sergey: yep, '\t' is the smallest value for this code to work without using an ordinal comparison.
But what about strings that contain the '\0' character? Illegal in native null-terminated strings, but perfectly valid in .NET strings. See my answer.
@Alex. That would work, but I still prefer my sample IComparer class for a number of reasons. 1. It is culture-sensitive. 2. It implements IComparer<T> so can be passed to a List<T>.Sort method. 3. It doesn't create any intermediate objects which gives a small performance benefit when called repeatedly in a loop (e.g. when sorting a large collection).
1

Unlike null-terminated C-style strings, a .NET string may contain null characters, so in the general case, there is no separator that will guarantee to sort lower than any valid character in a string.

Of course if you're sure your strings don't contain nulls and ordinal sorting is OK, some of the solutions already posted will work.

But I'd say don't even attempt it. Writing a comparer for your string arrays will be more efficient (avoiding creating the temporary strings) and better expresses your intent. There are lots of ways this could be done, including the following (compares IList<string> rather than string[] for more flexibility, and supports culture-sensitive comparisons):

public class StringListComparer : IComparer<IList<string>>
{
    private StringComparer comparer;

    public StringListComparer()
        : this(StringComparer.Ordinal)
    {
    }

    public StringListComparer(StringComparer comparer)
    {
        if (comparer == null) throw new ArgumentNullException("comparer");
        this.comparer = comparer;
    }

    public int Compare(IList<string> x, IList<string> y)
    {
        int result;
        for (int i = 0; i < Math.Min(x.Count, y.Count); i++)
        {
            result = comparer.Compare(x[i], y[i]);
            if (result != 0) return result;
        }
        return x.Count.CompareTo(y.Count);
    }
}

The above code is untested and possibly buggy, but I'm sure you get the idea.

1 Comment

"No" is an acceptable answer! I like your point about avoiding string allocation.
1

the String.CompareTo() method returns an int comparing the two values to the length of the shortest string. Given that '\0' is "less than" the character '1' it follows it in normal string order. You can use the String.Split() method to split on any character. including '\0'. You can then sort the strings using the Array.Sort() method.

Comments

1

Basically, if using "ordinal sorting", the operation you want is to simply concatenate all the strings. To illustrate, let's take two arrays of strings:

a={"My", "dog", "is",     "black"}
b={"My", "cat", "sleeps", "often"}

Comparing each string in these arrays until you find one that differs:

a={"My", "dog", "is",     "black"}
b={"My", "cat", "sleeps", "often"}
          ^
          first difference: b is "less than" a

... is exactly equal to comparing the concatenated strings:

a="Mydogisblack"
b="Mycatsleepsoften"
     ^
     First difference; again, b < a

Your method's code would simply be:

public static string StringsToKey(params string[] values)
{
    var builder = new StringBuilder();
    foreach(var s in values)
        builder.Append(s + "\0"); //the \0 is ASCII 0, used to delimit words
    return builder;
}

2 Comments

That doesn't work. Consider what happens if comparing {"abc", "def"} and {"ab", "xyz"}. "ab" should sort before "abc", but "abxyz" sorts after "abcdef".
Try the actual code; I didn't mention that point, but included it in my final answer.
1

This code seems to work for situations when input does not contain tabs:

    public static string StringsToKey(params string[] values)
    {
        return values.Aggregate("", (current, value) => current + value + '\t');
    }

1 Comment

But what about strings that contain other characters that sort lower than tab? E.g. '\0', '\1', ...? See my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.