2

I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers. This is based on something I read (and now can't find), which appears to be erroneous, which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this:

JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.

If we define the "size" of an element (number or string) as the length of its textual representation (so size = String(x).length for either a numeric element or a string element), then for a large array of elements of the same size (one numeric, and one strings), I was expecting the sorting of the strings to be equal to or slightly slower than sorting the arrays, but when I ran a simple test (code below), it turned out that the strings were about twice as fast to sort.

I want to know what it is about strings and numbers, and how javascript does its sorting, that makes string sorting faster than numeric sorting. Perhaps there is something I am misunderstanding.

Results:

~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474

Source:

"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";

function generateString(L) {
    const chars = [];
    while(chars.length < L) {
        chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
    }
    return chars.join("");
}

function generateNumber(L) {
    return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}

function generateList(generator, L, N) {
    const elements = [];
    while(elements.length < N) {
        elements.push(generator.call(null, L));
    }
    return elements;
}

function now() {
    return Date.now();
}

function getTime(baseTime) {
    return now() - baseTime;
}

function main(count, size) {
    console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
    const numbers = generateList(generateNumber, size, count);
    const numBaseTime = now();
    numbers.sort();
    const numTime = getTime(numBaseTime);

    console.log(`Sorting ${count} strings of length ${size}`);
    const strings = generateList(generateString, size, count);
    const strBaseTime = now();
    strings.sort();
    const strTime = getTime(strBaseTime);

    console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}

main(process.argv[2], process.argv[3]);
11
  • 1
    Strings - No need to find the weight. Numbers - You need to find the weight and based on that you need to sort. Like considering "500", "1000" vs. 500, 1000. Former is swapped while latter is right. This is just my understanding of why it might be this way. It is may or may not be correct. Commented Jun 29, 2017 at 10:57
  • 8
    The obvious problem is that you are sorting your numbers as strings. Commented Jun 29, 2017 at 11:01
  • Notice that floating point numbers in a magnitude of 10^32 will be using scientific notation anyway for stringification. There's only 64 bits of entropy, you can't get an arbitrarily long string representation. Commented Jun 29, 2017 at 11:08
  • @Bergi is right, if you don't provide a callback to the sort function then it proceeds with the default sorting which coerces the values to be compared into strings (if they are not already a string type). Commented Jun 29, 2017 at 11:59
  • Ah ok, thanks @Bergi. I'll re-run my test and see if that changes things significantly. Commented Jun 29, 2017 at 15:40

1 Answer 1

6

I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers.

That's true, string comparison is way more costly than number comparison.

This is based on something I read which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this.

Yes, what you read seems to be erroneous indeed. Strings are just sequences of characters (each character being a 16 bit value), so they're usually stored as arrays of integers, or rather pointers to them. Your array of strings could indeed be treated as an array of an arrays though.

When I ran a simple test, it turned out that the strings were about twice as fast to sort.

The problem with your code is that you are sorting your numbers as strings, which casts every number to a string and then compares that. See How to sort an array of integers correctly. When you fix that, notice that the call to the comparison function still has quite a bit of overhead to the builtin string comparison, so if you really benchmarked relational operators (<, ==, >) on the different types I'd expect numbers to perform even better.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.