0

To search a very large array,I was thinking for an algorithm with complexity less than log n ,means not of order less than log n but absolute less than log n.So what I did is instead of going to the middle just move 1 step forward and check how much we have to move further if numbers are evenly distibuted,move tto that position,if this is a solution break it otherwise calculate how much we have to move futher,do it iteratively until the solution is found Here's a working Java code:-

 public class Search {
        public static void main(String[] args) {
            int a[]={12,15,16,17,19,20,26,27};
            int required=27;
            int pointer=0;
            int n=1;
            int diff;
            int count=0;
            int length=a.length;
            while(a[pointer]!=required){
                count++;
                if ((pointer+n)>(length-1))
                    n=length-1-pointer;
                if(n==0)
                    n=-1;
                diff=a[pointer+n]-a[pointer];
                pointer=pointer+n;
                n=(required-a[pointer])*n/diff;


            }
            System.out.println(pointer);
            System.out.println(count);
        }

    }

P.S- I have an array which is near to evenly distributed.

I want to ask is it really better than binary search??In which cases it will fail?What is the best,avg and worst case complexity??

5
  • What you're doing is a bad idea that will in almost every case slow your search down. Commented Oct 27, 2014 at 11:04
  • @Rafael can you please explain how any why? Commented Oct 27, 2014 at 11:06
  • The only search quicker than binary search is hashing. O(1) complexity. Other than that, binary search is pretty much the best you could hope for in terms of complexity. Commented Oct 27, 2014 at 11:41
  • 1. you can estimate first few MSB bits of binary search by linear interpolation if your array is near linear distribution. that can improve runtime if N is big enough but not the complexity. 2. What you are trying to achieve is almost the same as O(log(N)) but with much more overhead and different log base then 2. so in the end your runtime could be worse then raw binary search Commented Oct 27, 2014 at 12:41
  • You're reinventing Interpolation Search. It might be faster because it can potentially do fewer probes but it may also be slower in practice because the code is more complicated. Commented Oct 27, 2014 at 17:10

1 Answer 1

2

You are using a heuristic to try to accelerate your sort. A heuristic is like a guess. It isn't guaranteed to be right - but if the heuristic is a good one can accelerate an algorithm in the general case.

Heuristics generally won't improve worst case running time of an algorithm. That is - it is possible for some inputs for the heuristic to be wrong.

I can see the intuitive appeal of what you are doing - you are "searching" closer to where you think your target might be.

But there are two problems with what you are doing:

  1. Moving the "split" in a binary search closer to the target does not speed up the search. In a binary search you split the search space in half each time. When you move the split point closer to the target, you have not found the target, and it is as likely as no that you target is in the larger of the two unequal spaces.

For example suppose you have the follow array. y is your target, x is all the other values:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx

In a binary search you would split the space in half and then half again in the first two decisions:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx
                ^        ^

After two decisions your 32 value array is down to a search space of 8 values. But suppose with your heuristic, that after the second choice you put the split after the y?

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx
                ^             ^

After your second decision you have only reduced the search space a little bit. By adding this heuristic you have reduced the worst case running time to N - because it is possible to construct inputs that will fool your heuristic into making the worst guess every time.

  1. The other problem is that heuristic methods to accelerate searches only help when you know something about what you are searching. Take dictionary searching. You know that z is at the end of alphabet. So when you get a word that starts with z, you know fairly well where in the dictionary the z words are. You don't have to start in the middle of the dictionary.

This is because you know something about the distribution of the words in a dictionary. But if someone made no guarantees about the words in a list - then you can't guarantee that dictionary search is faster - you might for example receive a list of all z words.

In your case your heuristic is not particularly good. You're guessing where the next split is based on the distance between the current split and the previous value. The only time that would be a good guess is if the elements in the list were evenly spaced. If they are unevenly spaced ( almost always ) then some guesses will always overshoot the split and other undershoot.

In any sorted array of unevenly spaced numbers there will necessarily be intervals that are more tightly spaced than average, and intervals more sparse than average. Your heuristic guesses at the average sparseness of the numbers at the current split to the end of the array. There is no relationship between those two things.

Update:

Your best case time: O(1) - e.g. you guess the index right off.

Worst case: O(N) - e.g. every choice is worst possible.

You added that your array is nearly evenly spaced and very large. My guess as to what in practice would be fastest: look up the first number and last number in the array, and the length of the array. Make an educated guess at the offset of your target:

offset = floor((( target - first ) / ( last - first )) * length );

Chose a reasonable search space around the target:

window_start = floor( offset * ( 1 - alpha ));
window_end   = floor( offset * ( 1 + alpha ));

Do a binary search on the sub-array defined by this window.

What you set alpha to will depend on how regular you think your array is. E.g. you can set to to 0.05 to search a window which is roughly 10% of the total search space around your estimated target.

If you can make some guarantees about evenness of the input you might be able to tune alpha optimally.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for such a good explanation.Your guessing algorithm is perfect but there is one issue.No doubt,I have a very large array and it is near to evenly distibuted,but it is in most cases not always.So it is not possible to decide alpha.Hence,I think i have to put my solution only into implementation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.