Removing duplicates from array using single loop

Question

I want to remove duplicates from array using a single loop, it's not working, This is what I've done so far.

Please note that I already know it works on sorted array, I use a single loop bubble sort for that, but I want it to work without sorting.

code.c

#include <stdio.h>

#define size 20
#define true 1
#define false 0

int main() {
    
    int input[size] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};

    int current = input[0], flag = false, index = 0;
    
    for (int x = 0; x < size; x++) {
        if (current == input[x] && (flag == false)) {
            flag = true;
        } else if (current != input[x]) {
            input[index++] = current;
            current = input[x];
            flag = false;
        }
    }
    
    for (int foo = 0; foo < index; foo++) {
        printf("%d", input[foo]);
        printf((foo != index - 1) ? ", " : "");
    }
    
    return 0;
}

input

1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2

output

1, 3, 4, 3, 5, 6, 7, 8, 1

The only way to remove duplicates with a single loop is with a hashset or equivalent. Sorting the list first also works, but technically sorting involves many loops. — RBarryYoung
– RBarryYoung, Commented Aug 24, 2021 at 12:37
calloc a status bit array, check if previously found and mark off. If the next value exceeds the range, realloc and clear the new elements. — Weather Vane
– Weather Vane, Commented Aug 24, 2021 at 12:39
Also, sorting doesn't keep the original order of element appearences (in case it is a requirement) — NiVeR
– NiVeR, Commented Aug 24, 2021 at 12:45
There are ways to hack around and have just one loop (ie. one keyword for), or even no loops (recursion). It's much more important to know which time you're aiming for: O(1), O(logn), O(n), O(n²)? etc... Because you may use recursion, one loop, gotos and what-not, but in the end, you will have a time boundary, which is what matters for sorting/removing duplicates/searching, etc... — Enzo Ferber
– Enzo Ferber, Commented Aug 24, 2021 at 12:52
if your input domain is tractable you could use a bitfield to indicate the previous encounter elements... like a hashmap would be way overblown if you only need to track 0-9... you only need a short for that, and you get 6 extra bits to play with... — Grady Player
– Grady Player, Commented Aug 24, 2021 at 14:47

Yun · Accepted Answer · 2021-08-24 15:27:22Z

1

There are several general solutions to this problem:

First sorting the array and then running your algorithm. This increases the complexity of the program to O(n log(n)) (general sorting algorithm) or O(n*w) (radix sort, where w is a known constant depending on the size of the type in practice) at best and does not preserve the original order. In other words, this solution requires multiple loops.
Using a map to detect which elements have occurred already. A significantly more complex solution with an additional O(log n) complexity.
If the range of possible elements is small, e.g. constrained to only the numbers 0 to 9, you could use a boolean array to keep track of which values occurred. This is essentially a simple version of the "map solution". This is the only option requiring a single loop. Code example:

#include <stdbool.h>
#include <stdio.h>

#define ARRAY_SIZE 20

int main()
{
    int input[ARRAY_SIZE] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};
    bool hasOccurred[10] = {0}; // The indices are used as keys

    size_t newSize = 0U;
    for (size_t arrayIdx = 0U; arrayIdx < ARRAY_SIZE; ++arrayIdx)
    {
        if (!hasOccurred[input[arrayIdx]])
        {
            hasOccurred[input[arrayIdx]] = true;
            input[newSize++] = input[arrayIdx];
        }
    }

    for (size_t idx = 0; idx < newSize; ++idx)
        printf("%d%s", input[idx], idx != newSize - 1U ? ", " : "\n");
}

Output:

1, 3, 4, 5, 6, 7, 8, 2

Use a combination of the previous algorithm and counting sort. First, initialize a int hasOccurred[10] array with -1 values. Then loop over the input array and, for each "new" element, store the input array index in the has occurred array. This array can be used as a sorted array (iterate ignoring the -1 values) or it can be used to construct an output array in which the original order is preserved. Depending on the use-case, this requires more than one loop.

AKX adds that variations on a boolean array are possible, such as using the individual bits of an unsigned int to store the "has occurred" flags. This is a speed/memory tradeoff.

Credit to pmg for suggesting radix sort.

edited Aug 24, 2021 at 15:27

answered Aug 24, 2021 at 12:55

Yun

3,8826 gold badges13 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

AKX Over a year ago

For very small ranges of possible elements, you can even use a single integer as a bitmap.

Yun Over a year ago

@AKX Nice addition, thank you. I'll add it to the answer.

user15242190 Over a year ago

I don't understand this version of c, and also i'd like to avoid a second array

Yun Over a year ago

@Arghadip I'm sorry, I didn't mean to write unintelligible C. What is it exactly that is unclear?

pmg Over a year ago

You can Radix sort in O(n)... just saying

|

rel · Accepted Answer · 2021-08-24 15:50:03Z

0

A possible solution that uses qsort() to sort the input array, before removing the duplicate elements, as already suggested in the comments above:

#include <stdio.h>
#include <stdlib.h>

#define ARRAY_SIZE 20

static int _compare(const void * a, const void * b) {
    return ( *(int *)a - *(int *)b );
}

int main(void)
{
    int data[ARRAY_SIZE] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};
    int index = 1;
    
    qsort(data, ARRAY_SIZE, sizeof(data[0]), _compare);
    
    for (int i = 1; i < ARRAY_SIZE; i++) {
        if (data[i-1] != data[i]) {
            data[index++] = data[i];
        }
    }
    
    for (int i = 0; i < index; i++) {
        printf("%d ", data[i]);
    }
    return 0;
}

Console output:

1 2 3 4 5 6 7 8

edited Aug 24, 2021 at 15:50

answered Aug 24, 2021 at 14:49

rel

7945 silver badges19 bronze badges

1 Comment

rel Over a year ago

My previous answer was not correct and didn’t answer that actual question. Thanks @Yun for pointing out that it contained a solution with two loops. The other example I’ve post also didn’t work correctly, so I’ve deleted it to not clutter Stack Overflow with wrong answers, sorry for that.

Collectives™ on Stack Overflow

Removing duplicates from array using single loop

2 Answers 2

9 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related