Simultaneous Sorting/Removing data from a List

Question

Currently I have a list of integers. This list contains index values that point to "active" objects in another, much larger list. If the smaller list of "active" values becomes too large, it triggers a loop that iterates through the small list and removes values that have become inactive. Currently, it removes them by simply ignoring the inactive values and adding them to a second list (and when the second list gets full again the same process is repeated, placing them back into the first list and so on).

After this trigger occurs, the list is then sorted using a Quicksort implementation. This is all fine and dandy.

-------Question---------

However, I see a potential gain of speed. I am imagining combining the removal of inactive values while the sorting is taking place. Unfortunately, I cannot find a way to implement quicksort in this way. Simply because the quicksort works with pivots, which means if values are removed from the list, the pivot will eventually try to access a slot in the list that does not exist, etc etc.. (unless I'm just thinking about it wrong).

So, any ideas on how to combine the two operations? I can't seem to find any sorting algorithms as fast as quicksort that could handle this, or perhaps I'm just not seeing how to implement it into a quicksort... any hints are appreciated!

Code for better understanding of whats currently going on: (Current Conditions: values can range from 0 to 2 million, no 2 values are the same, and in general they are mostly sorted, since they are sorted every so often)

if (deactive > 50000)//if the number of inactive coordinates is greater than 50k
{
    for (int i = 0; i < activeCoords1.Count; i++)
    {
         if (largeArray[activeCoords[i]].active == true)//if coordinate is active, readd to list
         {
             activeCoords2.Add(activeCoords1[i]);
         }
    }
    //clears the old list for future use
    activeCoords1.Clear();
    deactive = 0;
    //sorts the new list
    Quicksort(activeCoords2, 0, activeCoords2.Count() - 1);
 }    

    static void Quicksort(List<int> elements, int left, int right)
    {
        int i = left, j = right;
        int pivot = elements[(left + right) / 2];

        while (i <= j)
        {
            // p < pivot
            while (elements[i].CompareTo(pivot) < 0)
            {
                i++;
            }

            while (elements[j].CompareTo(pivot) > 0)
            {
                j--;
            }

            if (i <= j)
            {
                // Swap
                int tmp = elements[i];
                elements[i] = elements[j];
                elements[j] = tmp;

                i++;
                j--;
            }
        }

        // Recursive calls
        if (left < j)
        {
            Quicksort(elements, elements, left, j);
        }

        if (i < right)
        {
            Quicksort(elements, elements, i, right);
        }
    }

if the list is mostly sorted it is not a good idea to use quick sort, because on sorted lists quick sort become O(n^2). — mehdi.loa
– mehdi.loa, Commented May 4, 2013 at 7:00
Because the number range is between 0 and 2000000 you can use counting sort, and it seems that there is no need too merge the operations. — mehdi.loa
– mehdi.loa, Commented May 4, 2013 at 7:11
Unfortunately, I believe counting sort would undo certain connections, since values seem to be regenerated instead of moved. I need to keep each value reference intact, as the objects also need to keep their pointers pointing towards the correct value... its a bit obscure, but it needs to stay that way. :/ — Colton
– Colton, Commented May 4, 2013 at 7:46
There's something I'm missing - if the list from which you are removing items (activeCoords1) is already sorted, the list to which you are copying items (activeCoords2) will also be sorted because you are adding items to it in the same order that they were in activeCoords. What am I missing? — Matthew Watson
– Matthew Watson, Commented May 4, 2013 at 8:13
Also: (1) Your Quicksort sample won't compile (what is oldList?) and (2) it's using a List<int> so the elements to compare are ints, but you have the huge overhead of calling .CompareTo() to compare them - why not just do a direct comparison? Oh and (3) if you really, really need to make it as fast as possible, you'll have to use plain arrays rather than List<T>... — Matthew Watson
– Matthew Watson, Commented May 4, 2013 at 8:16

Slugart · Accepted Answer · 2013-05-04 07:26:08Z

1

It sounds like you might benefit from using a red-black tree (or another balanced binary tree), your search, insert and delete time will be O(log n). The tree will always be sorted so there will be no one off big hits incurred to re-sort.

What is your split in terms of types of access (search, insert, delete) and what are your constraints for reach?

answered May 4, 2013 at 7:26

Slugart

4,69027 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Colton Over a year ago

Currently, with just a list, there is no need for search, I only add values to the end, and as for delete, with the two list setup I have, there is no need for delete. Instead it just re-adds values to the opposite list that are still active... I'll have to pick up on this later unfortunately. Sorry about that, but I will get back to you as soon as I have a chance to read about your tree!

Slugart Over a year ago

I was referring to the needs of the consumer of your list not of the list itself.

Colton Over a year ago

Yup, that is what I'm explaining for the most part. The "consumer" is just a for loop that moves from 0 to max, grabbing values at the index. So there is no need for searching, etc. Is that what you mean?

Slugart Over a year ago

Ok, so you only do traversal of the list and the list must always be sorted when you traverse it.

Colton Over a year ago

Yes, and it is mostly sorted simply to keep memory calls closer together. So when it traverses through the list, the next value found should be the nearest in the array. (So I only need to sort every few seconds, but when I do, the extra speed would be a big help.)

|

Community · Accepted Answer · 2017-05-23 10:32:08Z

1

I would use a List<T> or a SortedDictionary<TKey, TValue> as your data structure.

As your reason for sorting ("micro optimization based on feelings") is not a good one, I would refrain from it. A good reason would be "it has a measurable impact on performance".

In that case (or of you just want to do it), I recommend a SortedDictionary. All the sorting stuff is already done for you, no reason to reinvent the wheel.

There is no need to juggle with two Lists if one appropriate data structure suffices. A red-black-tree seems appropriate and is apparently used in the SortedDictionary according to this

edited May 23, 2017 at 10:32

CommunityBot

11 silver badge

answered May 4, 2013 at 16:31

DasKrümelmonster

6,1081 gold badge27 silver badges52 bronze badges

3 Comments

Colton Over a year ago

I think you meant SortList<T>? Nevertheless, I am juggling two lists because otherwise, I would have no fast removal of data. Instead of removing one piece of data, waiting for the list to "move everything down", and repeating, I can simply take what I need, and in essence the "move everything down" process only needs to be completed once.

Colton Over a year ago

Also, jumping around up to 2 million memory spaces will always be faster if they are done in order. On previous tests I've saved more than half the time, so it is most likely worth it, and having them in order also provides a better level of manipulation when the time comes for it (say, if updating from lowest to highest becomes an issue).

DasKrümelmonster Over a year ago

Okay, in that case go with the sorted data structure. SortedList and SortedDictionary are similar in purpose but not in performance. With the former you get faster manipulations (add/remove in O(log n) ) and the latter uses less memory but has O(n) manipulations. (geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/…)

Collectives™ on Stack Overflow

Simultaneous Sorting/Removing data from a List

2 Answers 2

7 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related