1

I'm working on a program that takes in a bunch (y) of integers and then needs to return the x highest integers in order. This code needs to be as fast as possible, but at the moment I dont think I have the best algorithm.

My approach/algorithm so far is to create a sorted list of integers (high to low) that have already been input and then handle each item as it comes in. For the first x items, I maintain a sorted array of integers, and when each new item comes in, I figure out where it should be placed using a binary sort. (Im also considering just taking in the first x items and then quick sorting them, but I dont know if this is faster) After the first x items have been sorted I then consider the rest of the items by first seeing if they qualify to enter the already sorted list of highest integers (by seeing if the new integer is greater than the integer at the end of the list) and if it does, add it to the sorted list via a binary search and remove the integer at the end of the list.

I was wondering if anyone had any advice as to how I can make this faster, or perhaps an entire new approach that is faster than this. Thanks.

6
  • You only have to keep x + 1 slots for the x highest integers and do binary insertion in that list. The time complexity will be y log(x) which checks out: 1 highest number means y complexity while y - 1 highest numbers means "alsmost" sorted input in which case you have y log(y) :) Commented Jan 22, 2013 at 0:53
  • @andr inserting into the top k is very expensive, though. It becomes prohibitive even for small k like 100 or 1000. Commented Jan 22, 2013 at 0:54
  • Why very? - rather normal for binary insertion, e.g. 10 for x = 1000 (log(1000)=10). I'm assuming that x may be of any value (for example x = sqrt(y)). Or do I miss something here? Commented Jan 22, 2013 at 1:03
  • Do you need the highest x values in order, or just the highest x values? (Yes, it does make a difference.) Commented Jan 22, 2013 at 1:16
  • who is he???? [What is the fastest way to find the k largest elements in an array in order (i.e. starting from the largest element to the kth largest element)?][1] [1]: stackoverflow.com/questions/14450024/… Commented Jan 22, 2013 at 1:29

4 Answers 4

2

This is a partial sort:

The fastest implementation is Quicksort where you only recurse on ranges containing the bottom/top k elements.

In C++ you can just use std::partial_sort

Sign up to request clarification or add additional context in comments.

2 Comments

+1 for nice source of information and providing a library function
Based on the clarification, I think you need std::partial_sort instead of std::nth_element
0

If you use a heap-ordered tree data structure to store the integers, inserting a new integer takes no more than lg N comparisons and removing the maximum takes no more than 2 lg N comparisions. Thus, to insert y items would require no more than y lg N comparisons and to remove the top x items would require no more than 2x lg N comparisons. The Wikipedia entry has references to a range of implementations.

1 Comment

i am good feeling if you add me in FB with searching amin khoramian k.
0

This is called a top-N sort. Here is a very simple and efficient scheme. No fancy data structures needed.

  1. Keep a list of the highest x elements (it starts out empty)
  2. Split your input into chunks of x * 10 items
  3. For each chunk, add the remembered list of the x highest items so far to it and sort it (e.g. quick sort)
  4. Keep the x highest items. They form the new remembered list
  5. goto 3 until all chunks processed
  6. The remembered list is now your final result

This is O(N) in the number of items and only requires a normal quick sort as a primitive.

9 Comments

Isn't that O(N) only if x is fixed?
Wow, thanks for the response. Is this the fastest method? (There seems to be quite a few different opinions) Also, in response to andr, y and x are set as constants before any of the integers are received
I'm just saying that the complexity of this method is not O(N). It's O(N logx). Given expected time for quicksort k log(k) check the math: O((N / 10x) * (10x log(10x))) = O(N logx).
Andr, do you think the method you proposed is faster? If they're both on N log x which is more advisable?
They're equal to me, but I would simply code mine faster ;) See rici's comment to your question. If you're into C++ you really might consider emilk's answer since the library function might be already optimized and have better expected time.
|
0

You don't seem to need the top N items in sorted order. Because of this, you can solve this in linear time.

Find the Nth largest array element using linear-time selection. Return it and all array elements larger than it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.