Efficient algorithm for finding the same number [duplicate]

Question

There are 1002 numbers in an array and two numbers are the same. How would you find the same number in this array efficiently or is there an efficient algorithm?

Here is my algorithm:

for i in range(0, 1002):
    for j in range(i+1, 1002):
        if(a[i]==a[j]):
           return a[i]

What is range of values of numbers? use hashing! It will work in O(n) time! — Nullpointer
– Nullpointer, Commented Jun 15, 2014 at 18:42
Still you will have some range! what is that range? What is your requirement a memory efficient algorithm? or time efficient algorithm? — Nullpointer
– Nullpointer, Commented Jun 15, 2014 at 18:48
Without "efficiency" defined, it's an ambiguous question with no real answer. If I asked this question as an interviewer, I'd be fishing for a candidate to ask what I meant before I cared about an answer. — Preston Guillot
– Preston Guillot, Commented Jun 15, 2014 at 18:57

Nullpointer · Accepted Answer · 2014-06-15 18:57:36Z

2

This should work!

#include<stdio.h>
#define RANGE 1000000001
int main()
{
  int arr[1002];//your all numbers;
  short int hash[RANGE];//Your range of numbers 
  long long int i;
  for(i = 0; i < RANGE; i++)
    hash[i] = 0;
  for(i = 0; i < 1002; i++)
    {
      if(hash[arr[i]] != 0)
    {
      printf("Duplicate number is:%d\n",arr[i]);
      break;
    }
      else
    hash[arr[i]]++;
    }
  return 0;
}

answered Jun 15, 2014 at 18:57

Nullpointer

1,0867 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Niklas B. Over a year ago

I wouldn't call the identity function a hash function, so the name is a bit misleading :) And you should really use RANGE = 1003

n. m. could be an AI Over a year ago

Note that this solution requires at least 1000000001 operations for zeroing out hash, which is about 1000 times more than the original 1000*10001, and cannot cope with numbers greater than 1000000001.

Nullpointer Over a year ago

@NiklasB. If I use RANGE = 1003 then I will have to limit range of my values to 1003 but OP says no such value range is defined this is why I have used RANGE = 1000000001. And yes hash function is a bit misleading! :p

Niklas B. Over a year ago

Well in that case you must use a hash table ;)

Aleksei Shestakov · Accepted Answer · 2014-06-15 18:53:29Z

1

I think the most efficient solution is to use hash set:

from sets import Set
s=Set()
for x in [1,2,3,4,5,2,3,1]:
  if x in s:
    print x
    break
  s.add(x)

answered Jun 15, 2014 at 18:53

Aleksei Shestakov

2,5482 gold badges15 silver badges14 bronze badges

Comments

Rliger · Accepted Answer · 2014-06-15 18:46:54Z

0

If your values are numbers, you can use radix sort to fill up a buffer and check for an element that appeared twice.

answered Jun 15, 2014 at 18:46

Rliger

794 bronze badges

Comments

Christophe · Accepted Answer · 2014-06-15 19:47:17Z

Your algortihm isn't bad at all ! In the worst case you loop n*(n-1)/2, meaning a complexity of O(n²).

The most favourable condition would be a sorted array. THen you could just loop through it comparing each element with its predecessor. The worst is n-1 comparisons, otherwhise said a complexity of O(n).

However, I assume that the array is not sorted. Sorting it would imply the cost of the sort. Quiksort algorithm, which is pretty good here, has a worstcase of O(n²). So sorting+traversing would have a cost comparable to your algorithm.

Using a hash... well, it's optimal if memory is not a problem (see exellent solution from @Nullpointer. The algorithm cost is the simple traversal, which is O(n).

However in real life, you risk to have memory constraints, meaning shorter hash table and a hash function with risks of colisions (for example modulo size of table). For this reason you'll need to store for each hash value, the list of matching values. In such a situation, the worstcase is when all numbers have the same hash H. In this case, you would calculate each hash (simple O(n) traversal), but when inserting the hash, you'd need to loop through the colision list. A quick calculation shows that again you'd have n*(n-1)/2 comparison, and again a compelxity O(n²), the same as your original proposal.

Collectives™ on Stack Overflow

Efficient algorithm for finding the same number [duplicate]

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Related