You are using a heuristic to try to accelerate your sort. A heuristic is like a guess. It isn't guaranteed to be right - but if the heuristic is a good one can accelerate an algorithm in the general case.
Heuristics generally won't improve worst case running time of an algorithm. That is - it is possible for some inputs for the heuristic to be wrong.
I can see the intuitive appeal of what you are doing - you are "searching" closer to where you think your target might be.
But there are two problems with what you are doing:
- Moving the "split" in a binary search closer to the target does not speed up the search. In a binary search you split the search space in half each time. When you move the split point closer to the target, you have not found the target, and it is as likely as no that you target is in the larger of the two unequal spaces.
For example suppose you have the follow array. y is your target, x is all the other values:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx
In a binary search you would split the space in half and then half again in the first two decisions:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx
^ ^
After two decisions your 32 value array is down to a search space of 8 values. But suppose with your heuristic, that after the second choice you put the split after the y?
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxYxx
^ ^
After your second decision you have only reduced the search space a little bit. By adding this heuristic you have reduced the worst case running time to N - because it is possible to construct inputs that will fool your heuristic into making the worst guess every time.
- The other problem is that heuristic methods to accelerate searches only help when you know something about what you are searching. Take dictionary searching. You know that z is at the end of alphabet. So when you get a word that starts with z, you know fairly well where in the dictionary the z words are. You don't have to start in the middle of the dictionary.
This is because you know something about the distribution of the words in a dictionary. But if someone made no guarantees about the words in a list - then you can't guarantee that dictionary search is faster - you might for example receive a list of all z words.
In your case your heuristic is not particularly good. You're guessing where the next split is based on the distance between the current split and the previous value. The only time that would be a good guess is if the elements in the list were evenly spaced. If they are unevenly spaced ( almost always ) then some guesses will always overshoot the split and other undershoot.
In any sorted array of unevenly spaced numbers there will necessarily be intervals that are more tightly spaced than average, and intervals more sparse than average. Your heuristic guesses at the average sparseness of the numbers at the current split to the end of the array. There is no relationship between those two things.
Update:
Your best case time: O(1) - e.g. you guess the index right off.
Worst case: O(N) - e.g. every choice is worst possible.
You added that your array is nearly evenly spaced and very large. My guess as to what in practice would be fastest: look up the first number and last number in the array, and the length of the array. Make an educated guess at the offset of your target:
offset = floor((( target - first ) / ( last - first )) * length );
Chose a reasonable search space around the target:
window_start = floor( offset * ( 1 - alpha ));
window_end = floor( offset * ( 1 + alpha ));
Do a binary search on the sub-array defined by this window.
What you set alpha to will depend on how regular you think your array is. E.g. you can set to to 0.05 to search a window which is roughly 10% of the total search space around your estimated target.
If you can make some guarantees about evenness of the input you might be able to tune alpha optimally.