1

I want to remove duplicates from array using a single loop, it's not working, This is what I've done so far.

Please note that I already know it works on sorted array, I use a single loop bubble sort for that, but I want it to work without sorting.

code.c

#include <stdio.h>

#define size 20
#define true 1
#define false 0

int main() {
    
    int input[size] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};

    int current = input[0], flag = false, index = 0;
    
    for (int x = 0; x < size; x++) {
        if (current == input[x] && (flag == false)) {
            flag = true;
        } else if (current != input[x]) {
            input[index++] = current;
            current = input[x];
            flag = false;
        }
    }
    
    for (int foo = 0; foo < index; foo++) {
        printf("%d", input[foo]);
        printf((foo != index - 1) ? ", " : "");
    }
    
    return 0;
}

input

1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2

output

1, 3, 4, 3, 5, 6, 7, 8, 1
12
  • 7
    The only way to remove duplicates with a single loop is with a hashset or equivalent. Sorting the list first also works, but technically sorting involves many loops. Commented Aug 24, 2021 at 12:37
  • 1
    calloc a status bit array, check if previously found and mark off. If the next value exceeds the range, realloc and clear the new elements. Commented Aug 24, 2021 at 12:39
  • 3
    Also, sorting doesn't keep the original order of element appearences (in case it is a requirement) Commented Aug 24, 2021 at 12:45
  • 2
    There are ways to hack around and have just one loop (ie. one keyword for), or even no loops (recursion). It's much more important to know which time you're aiming for: O(1), O(logn), O(n), O(n²)? etc... Because you may use recursion, one loop, gotos and what-not, but in the end, you will have a time boundary, which is what matters for sorting/removing duplicates/searching, etc... Commented Aug 24, 2021 at 12:52
  • 1
    if your input domain is tractable you could use a bitfield to indicate the previous encounter elements... like a hashmap would be way overblown if you only need to track 0-9... you only need a short for that, and you get 6 extra bits to play with... Commented Aug 24, 2021 at 14:47

2 Answers 2

1

There are several general solutions to this problem:

  1. First sorting the array and then running your algorithm. This increases the complexity of the program to O(n log(n)) (general sorting algorithm) or O(n*w) (radix sort, where w is a known constant depending on the size of the type in practice) at best and does not preserve the original order. In other words, this solution requires multiple loops.

  2. Using a map to detect which elements have occurred already. A significantly more complex solution with an additional O(log n) complexity.

  3. If the range of possible elements is small, e.g. constrained to only the numbers 0 to 9, you could use a boolean array to keep track of which values occurred. This is essentially a simple version of the "map solution". This is the only option requiring a single loop. Code example:

#include <stdbool.h>
#include <stdio.h>

#define ARRAY_SIZE 20

int main()
{
    int input[ARRAY_SIZE] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};
    bool hasOccurred[10] = {0}; // The indices are used as keys

    size_t newSize = 0U;
    for (size_t arrayIdx = 0U; arrayIdx < ARRAY_SIZE; ++arrayIdx)
    {
        if (!hasOccurred[input[arrayIdx]])
        {
            hasOccurred[input[arrayIdx]] = true;
            input[newSize++] = input[arrayIdx];
        }
    }

    for (size_t idx = 0; idx < newSize; ++idx)
        printf("%d%s", input[idx], idx != newSize - 1U ? ", " : "\n");
}

Output:

1, 3, 4, 5, 6, 7, 8, 2
  1. Use a combination of the previous algorithm and counting sort. First, initialize a int hasOccurred[10] array with -1 values. Then loop over the input array and, for each "new" element, store the input array index in the has occurred array. This array can be used as a sorted array (iterate ignoring the -1 values) or it can be used to construct an output array in which the original order is preserved. Depending on the use-case, this requires more than one loop.

AKX adds that variations on a boolean array are possible, such as using the individual bits of an unsigned int to store the "has occurred" flags. This is a speed/memory tradeoff.

Credit to pmg for suggesting radix sort.

Sign up to request clarification or add additional context in comments.

9 Comments

For very small ranges of possible elements, you can even use a single integer as a bitmap.
@AKX Nice addition, thank you. I'll add it to the answer.
I don't understand this version of c, and also i'd like to avoid a second array
@Arghadip I'm sorry, I didn't mean to write unintelligible C. What is it exactly that is unclear?
You can Radix sort in O(n)... just saying
|
0

A possible solution that uses qsort() to sort the input array, before removing the duplicate elements, as already suggested in the comments above:

#include <stdio.h>
#include <stdlib.h>

#define ARRAY_SIZE 20

static int _compare(const void * a, const void * b) {
    return ( *(int *)a - *(int *)b );
}

int main(void)
{
    int data[ARRAY_SIZE] = {1, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 7, 8, 1, 1, 1, 1, 2, 2};
    int index = 1;
    
    qsort(data, ARRAY_SIZE, sizeof(data[0]), _compare);
    
    for (int i = 1; i < ARRAY_SIZE; i++) {
        if (data[i-1] != data[i]) {
            data[index++] = data[i];
        }
    }
    
    for (int i = 0; i < index; i++) {
        printf("%d ", data[i]);
    }
    return 0;
}

Console output:

1 2 3 4 5 6 7 8

1 Comment

My previous answer was not correct and didn’t answer that actual question. Thanks @Yun for pointing out that it contained a solution with two loops. The other example I’ve post also didn’t work correctly, so I’ve deleted it to not clutter Stack Overflow with wrong answers, sorry for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.