6

I've been reading conflicting answers about modern javascript engines' time complexity when it comes to sets vs arrays in javascript.

I completed the demo task of codility, which is a simple assignment to find a solution for the following: given an array A of N integers, return the smallest positive integer (greater than 0) that does not occur in A.

For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.

My first solution was:

const solution = arr => {
    for(let int = 1;;int++) {
        if (!arr.includes(int)) {
            return int;
        }
    }
}

Now, the weird thing is that codility says this solution has a time complexity of O(n**2) (they prefer a solution of complexity O(n). As far as I know, array.prototype.includes is a linear search (https://tc39.es/ecma262/#sec-array.prototype.includes) meaning it should have an O(n) time complexity.

If I enter a different solution, using a Set, I get the full score:

const solution = arr => {
  const set = new Set(arr);
  let i = 1;

  while (set.has(i)) {
    i++;
  }

  return i;
}

Codility says this apparently has a time complexity of O(N) or O(N * log(N)).

Is this correct? Is array.prototype.includes in fact O(n**2) instead of O(n)?

Lastly, I'm a bit confused as to why Set.has() is preferred as in my console performance tests, Array.includes() is consistently outperforming the solution to first create a Set and then looking it up on the set, as can be seen in the following snippet.

const rand = (size) => [...Array(size)].map(() => Math.floor(Math.random() * size));

const small = rand(100);
const medium = rand(5000);
const large = rand(100000);

const solution1 = arr => {
  console.time('Array.includes');
  for(let int = 1;;int++) {
    if (!arr.includes(int)) {
      console.timeEnd('Array.includes');
      return int;
    }
  }
}

const solution2 = arr => {
  console.time('Set.has');
  const set = new Set(arr);
  let i = 1;
  while (set.has(i)) {
    i++;
  }
  console.timeEnd('Set.has');
  return i;
}

console.log('Testing small array:');
solution1(small);
solution2(small);
console.log('Testing medium array:');
solution1(medium);
solution2(medium);
console.log('Testing large array:');
solution1(large);
solution2(large);

If a set lookup has better time complexity (if that's true) and is preferred by codility, why are my performance tests favoring the array.prototype.includes solution?

3
  • 2
    includes is linear, but your outer for loop can go up to n too. Commented Dec 10, 2020 at 17:59
  • With the Set approach, new Set(arr) is O(n), but set.has(i) is O(1), meaning O(2n) altogether (while loop taken into account, worst case n), which reduces to O(n) (if I'm not wrong). Commented Dec 10, 2020 at 18:03
  • This makes sense, thank you! Commented Dec 10, 2020 at 22:19

3 Answers 3

5

I know this is an old question, but I was double checking the data. I too assumed Set.has would be O(1) or O(log N), but in my first test, it appeared to be O(N). The specs for these functions hint as much, but are quite hard to decipher: https://tc39.es/ecma262/#sec-array.prototype.includes https://tc39.es/ecma262/#sec-set.prototype.has Elsewhere, though, they also say that Set.has must be sublinear-- and I believe modern implementations are.

Empirically, Set.has demonstrates linear performance when I ran it in some code playgrounds... but in real environments like node and chrome, they there were no surprises. I'm not sure what the playground was running on the back end, but perhaps a Set polyfill was used. So be careful!

Here's my test cases, trimmed down to remove the randomness:

const makeArray = (size) => [...Array(size)].map(() => size);

const small =  makeArray(1000000);
const medium = makeArray(10000000);
const large =  makeArray(100000000);

const solution1 = arr => {
  console.time('Array.includes');
  arr.includes(arr.length - 1)
  console.timeEnd('Array.includes');
}

const solution2 = arr => {
  const set = new Set(arr)
  console.time('Set.has');
  set.has(arr.length-1)
  console.timeEnd('Set.has');
}


console.log('** Testing small array:');
solution1(small);
solution2(small);
console.log('** Testing medium array:');
solution1(medium);
solution2(medium);
console.log('** Testing large array:');
solution1(large);
solution2(large);

In Chrome, though:

** Testing small array:
VM183:10 Array.includes: 1.371826171875 ms
VM183:17 Set.has: 0.005859375 ms
VM183:25 ** Testing medium array:
VM183:10 Array.includes: 14.32568359375 ms
VM183:17 Set.has: 0.009765625 ms
VM183:28 ** Testing large array:
VM183:10 Array.includes: 115.695068359375 ms
VM183:17 Set.has: 0.0048828125 ms

In Node 16.5:

Testing small array:
Array.includes: 1.223ms
Set.has: 0.01ms
Testing medium array:
Array.includes: 11.41ms
Set.has: 0.054ms
Testing large array:
Array.includes: 127.297ms
Set.has: 0.047ms

So, yeah, Arrays are definitionly linear, and Sets are much faster.

Sign up to request clarification or add additional context in comments.

Comments

2

I did the leetcode question Contains Duplicates where you have to check if there are duplicates in an given array of numbers. I submitted two solutions:

  1. with a HashSet and the .has() method

function containsDuplicate(nums) {
  let duplicates = new Set(); 

  for (let num of nums) {
    if (hashSet.has(num)) 
      return true;

    hashSet.add(num);
  }
  return false;
}

  1. with an Array and the .includes() method

function containsDuplicate(nums) {
  let duplicates = []; 

  for (let num of nums) {
    if (duplicates.includes(num)) 
      return true;

    hashSet.add(num);
  }
  return false;
}

I can just tell you that LeetCode won't accept the Array solution. I get a Time Limit Exceeded Error and just 70/75 test cases pass. So at a very high input .includes() won't perform.

Comments

1

The comparison like that is not entirely fair because in the function where you use the Set, the Array needs to be converted to a Set first, which takes some time.

Have a look at the results below if this is ignored. I have updated the solution2 function to receive a Set and changed the while loop to a for loop - for the sake of direct comparison.

You may notice that for a small array, Set might be slower. This is trivial because the time complexity only really comes into affect for a large (significant) n.

Also note, Array.includes is indeed O(n) but because it is in a for loop which in the worst case could go up to n the solution has a time complexity of O(n^2).

const rand = (size) => [...Array(size)].map(() => Math.floor(Math.random() * size));

const small = rand(100);
const medium = rand(5000);
const large = rand(100000);

const solution1 = arr => {
  console.time('Array.includes');
  for (let int = 1;;int++) {
    if (!arr.includes(int)) {
      console.timeEnd('Array.includes');
      return int;
    }
  }
}

const solution2 = set => {
  console.time('Set.has');
  for (let i = 1;;i++) {
    if (!set.has(i)) {
      console.timeEnd('Set.has');
      return i
    }
  }
}

console.log('Testing small array:');
solution1(small);
solution2(new Set(small));
console.log('Testing medium array:');
solution1(medium);
solution2(new Set(medium));
console.log('Testing large array:');
solution1(large);
solution2(new Set(large));

2 Comments

> "Array.includes is indeed O(n)" According to @yury-tarabanko he seems to think its Linear given his interpretation of the spec stackoverflow.com/a/48761894/4642530. Maybe Im missing something but I left a comment there, my interpretation of the spec is that it is also O(n). (maybe under the hood ecma is using a hashmap somehow?)
I see that your comment was already addressed in the question you linked above. The spec indeed indicates a time complexity of O(n) which is also called linear. There might have been confusion with O(1), however this is called a constant time complexity.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.