2

In a scientific application I'm writing (vb.net) I use a huge collection of object stored in a tree structure. Every object exposes a LastAccessed property (datetime) that stores the last time the node was accessed by the algorithm.

I need to find a fast way to find the N least accessed objects in the structure.

I'm using an algorithm like this (pseudo-code)

dim Olist as new list (of myobject)
....
array.sort(Olist,address of compareByDataReverse)
for index=0 to N-1
   dosomething(Olist.item(index))
end

I'm sure there is a better way to do that because the list is normally huge ( consisting of more than 10^6 objects).

Thanks
Pierluigi

3 Answers 3

2

You mention that you use a tree structure yet you show us a list. If you use a list you can indeed sort it (O(l log(l)) and then take the last n elements in O(1). Or just iterate it and get your last n elements in (O(l)).

You however work with a tree view. You didn't mention how your tree is implemented. You can create a binary search tree with as key your LastAccessed date. Searching for your first N elements can then also be done very fast.

Alternatively you could keep an index. Just create a tree view with as key the LastAccessed date. Inserting or removing an item in your original list should also update your index. You can then quickly retrieve a link to your items based on their date. At the cost of slower inserts/removals.

Sign up to request clarification or add additional context in comments.

1 Comment

you are righ :) what i wrote is a bit confusing becouse I'm not that goog in writing in english! the list a mentioned is just a list of reference to the leaves of the tree.
1

in best case your code will perform with O(nlogn). You should begin with dim Olist as new list (of myobject) then take a LinkedList and go over your collection and add values to your list keeping it sorted. This will be O(n*m)

8 Comments

That's called insertion sort and it's O(n^2) - which is worse than O(nlogn) - so it's likely to be slower on datasets that large.
please read carefully. There are two values n - size of ENTIRE collection and m - size of list he want to have. and m is much smaller then n. then reread my answer and your comment.
Andrey has explained better than me the situation N, size of the collection is HUGE while M, size of the elements is a short list...in my application I might expect N=10^4m
Why not just use a sorted list/array and keep it sorted. Inserts will then take O(log(n)). Retrieval of the first n O(1). It all depends on the number of inserts compared to the number of selects. If there are a lot more selects, keep it sorted.
@pierusch - Are you sure you mean 10^4m?
|
1

If you use Selection sort then after N iterations first N elements will be in sorted order.

NOTE:But I am not sure if this solves your problem completely.

4 Comments

not completely...I mean i'm sure the array.sort(...) function is really well implemented...I just think there is a better way to find the N least accessed elements inside the list WITHOUT sorting the entire list of objects... anyway thanks for your reply :)
:) so you can stop Selection Sort after N steps!
+1 avoids sorting all the data (but may be slower if the number of items to find is very large).
I don't know why but what I think is this should be fine as long as N^2 < (TotalNumberOfItems)/2. What do you say?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.