0

I have an array of size "n" with unsorted and reapeated integers between [0-99]. I already know that there is one number missing. So, what is the fastest way to find the missing number? This is my solution until now in C.

int find_missing_number(int array[], int n)
{
    char checker[100] = {0};
    int xor = 0, counter = 0, i, temp;
    //xor = (n%4==0) ? 0 : (n%4==1) ? n-1 : (n%4==2) ? 1 : n;

    for(i = 0; i < n; i++)
        if(checker[temp = array[i]] == 0)
        {
            checker[temp] = 1;
            xor ^= temp;
            if(++counter == 99)
                return xor;
        }

    return -1;
}
10
  • You should explain the logic. From the structure it looks like linear time running, which is best of anything I can think of. Commented Apr 13, 2015 at 19:59
  • @luis If values can be repeated then why only one numberr is missed? Commented Apr 13, 2015 at 20:01
  • Is this some interview question? More than one missing possible? Is value of n always 99 (it should be if only one number is missing and array contains all other elements from 0-99 once)? Commented Apr 13, 2015 at 20:02
  • You cannot do better than O(n) operations in the worst case, where n is the size of the array to test. You cannot do better than O(k) in the best case, where k is the number of distinct values in the array. Commented Apr 13, 2015 at 20:02
  • 2
    I'd like to see the reasoning for returning xor. Also, checker really doesn't have to be int-sized. char or bool are quite enough. Commented Apr 13, 2015 at 20:03

5 Answers 5

1

You can do this in O(n). Iterate through the array and compute the sum of all numbers. Now, sum of natural numbers from 1 to N, can be expressed as Nx(N+1)/2. In your case N=99. Subtract the sum of the array from Nx(N+1)/2, where N=99. That is the missing number. The empty slot can be detected during the iteration in which the sum is computed.

Im not entirely sure if this is a faster method or not, but an option for you to try.

Edit: This would work if you didnt have repeated values.

Sign up to request clarification or add additional context in comments.

4 Comments

Very clever, but it breaks when there are repeated numbers, as the OP said must be accommodated.
Ah, yes, i reread it and realized he said repeated numbers. Very interesting problem. Thank you for the clarification.
@EugeneSh. I don't think there's any ambiguity about repeated numbers. What's wanted is the value from the range 0-99 that appears zero times in the input array. Repeated numbers don't confuse the specification (unlike the possibility that there is more than one value in the range that appears zero times).
@JohnBollinger Yeah, I've reread the statement and removed the comment. But it should state that any number but one is present in the array.
1

Given that you may have duplicates, I don't see any way around keeping some kind of record of which values have been seen that you can read out on a per-value basis. Approaches based on one variety or another of one-way hashing function (two of which were proposed and then retracted) cannot do the job on their own. It may be, however, that you can save the effort of scanning your record of seen values after populating it by combining it with a hash-like function. For example,

int find_missing_number(int array[], int n)
{
    char checker[100] = {0};
    int xor = XOR_OF_1_TO_100;
    int i;

    for(i = 0; i < n; i++) {
        xor ^= (checker[array[i]] ? 0 : array[i] + 1);
        checker[array[i]] = 1;
    }

    return xor - 1;
}

This is admittedly pretty similar to your version, but I'm more confident that it will work, and I think that it probably runs slightly faster.

Note that I do not declare any variables register -- the compiler is much better than I am at choosing which variables should live in registers and which not, and it is not obligated to take my advice on the matter in any case.

Also, the elements of the checker array have type char, allowing four times as many to reside in a CPU cache line at once than if they were type int (assuming 1-byte chars and 4-byte ints).

Note too that I avoid counting distinct values or otherwise branching inside the loop (the ternary expression can be implemented without a branch). Avoiding the counting and conditional statement will speed the case where indeed one value is missing, and might or might not in practice slow down the case where none is missing. The gain may arise from more than just having less code -- sometimes such simplifications can allow the compiler to generate more efficient instruction sequences, too.

Of course, the whole thing is junk (and the problem is not well specified) if more then one value may be missing.

2 Comments

You're right with use "char" instead "int", is lower memory consumtion and you're also right with the register variables, the compiler it's in charge of that task. Regard to the suggestion of removing "counter" variable I have to test it becouse "n" it's really big, and it's not suitable to run always from 1..n. but you're right again with "Avoiding the counting and conditional statement".
In the worst case the counter variable does not help at all. Whether it helps in the average case depends on a lot of factors, but large among those is the value of n, the distribution of values in the input array, and the probability distribution of the missing value. Note, too, that if there is any possibility that no value is unrepresented in the input array after all, then you need to scan the whole array in all other cases to verify that there is, in fact, an unrepresented value.
1

The original program is already very fast. However, given the problem statement, it can be made faster by changing this:

        if(++counter == 100)
            return -1;

to this:

        if(++counter == 99)
            return xor;

This change causes the program to immediately return the answer when 99 distinct elements have been found. So if the array were very large, only a small portion of the array would be processed and the rest of the array can be ignored. This depends on the problem statement which states that it is known that one element is missing.

Comments

1

Javia1492 had the main idea in his answer: summing the numbers.

The checker array prevents you to sum up twice the same number.

The counter variable counts the numbers of different numbers to end the search before testing the whole n numbers in array once you already have 99 different values.

It also works with the xor operator as you did but I don't like it a lot because that 1^2^3^..^99 == 0 is some sort of particular case you can't generalize with 1^2^..^N.

int find_missing_number(int array[], int n)
{
    char checker[100];
    register int sum = 0, counter = 0, i, temp;

    memset(checker, 0, sizeof(checker));
    for (i = 0; i < n; i++)
    {
        if (checker[temp = array[i]] == 0)
        {
            checker[temp] = 1;
            sum += temp;
            if (++counter == 99)
               break;
        }
    }
    return 4950 - sum; // (100*99)/2 - sum
}

1 Comment

Before "Ask this Question", I searched for similar questions and I find this solution you propose, which seems to be very interesting but I also find the xor solution. I have implemented and tested both and the xor solution works bit faster, due to the simplicity of xor calculations in comparison with sums and subtraction.
1

As there are repeated elements, which I missed to see before writing my previous solution, I believe complexity of your solution is best.

I would suggest, instead of xor'ing the elements, you could just read the checker array to find missing element.

int find_missing_number(int array[], int n)    
{
    int checker[100] = {0};

    for(int i = 0; i < n; i++)
        checker[array[i]]++;

    for(int i=0; i <100; i++)
        if(0 == checker[i])
            return i;
}

This will reduce the operations which depends on n, making the code faster for bigger n.

3 Comments

No, the array size is not guaranteed to be 99. Per the OP's comments, there may be repeated values. This is extremely clever for the case where there are no repeats, but it breaks if more than one value appears an even number of times (including zero times).
Thanks, I missed it in first look.
This looks OK to me, except it should return i.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.