Finding duplicates in O(n) time and O(1) space

Input: Given an array of n elements which contains elements from 0 to n-1, with any of these numbers appearing any number of times.

Goal : To find these repeating numbers in O(n) and using only constant memory space.

For example, let n be 7 and array be {1, 2, 3, 1, 3, 0, 6}, the answer should be 1 & 3. I checked similar questions here but the answers used some data structures like HashSet etc.

Any efficient algorithm for the same?

31651 次浏览

Here is the pseudocode

for i <- 0 to n-1:
if (A[abs(A[i])]) >= 0 :
(A[abs(A[i])]) = -(A[abs(A[i])])
else
print i
end for

Sample code in C++

This is what I came up with, which doesn't require the additional sign bit:

for i := 0 to n - 1
while A[A[i]] != A[i]
swap(A[i], A[A[i]])
end while
end for


for i := 0 to n - 1
if A[i] != i then
print A[i]
end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that it may not look O(n) at first blush, but it is - although it has a nested loop, it still runs in O(N) time. A swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The second loop prints the values of x for which A[x] doesn't equal x - since the first loop guarantees that if x exists at least once in the array, one of those instances will be at A[x], this means that it prints those values of x which are not present in the array.

(Ideone link so you can play with it)

For relatively small N we can use div/mod operations

n.times do |i|
e = a[i]%n
a[e] += n
end


n.times do |i|
count = a[i]/n
puts i if count > 1
end

Not C/C++ but anyway

http://ideone.com/GRZPI

caf's brilliant answer prints each number that appears k times in the array k-1 times. That's useful behaviour, but the question arguably calls for each duplicate to be printed once only, and he alludes to the possibility of doing this without blowing the linear time/constant space bounds. This can be done by replacing his second loop with the following pseudocode:

for (i = 0; i < N; ++i) {
if (A[i] != i && A[A[i]] == A[i]) {
print A[i];
A[A[i]] = i;
}
}

This exploits the property that after the first loop runs, if any value m appears more than once, then one of those appearances is guaranteed to be in the correct position, namely A[m]. If we are careful we can use that "home" location to store information about whether any duplicates have been printed yet or not.

In caf's version, as we went through the array, A[i] != i implied that A[i] is a duplicate. In my version, I rely on a slightly different invariant: that A[i] != i && A[A[i]] == A[i] implies that A[i] is a duplicate that we haven't seen before. (If you drop the "that we haven't seen before" part, the rest can be seen to be implied by the truth of caf's invariant, and the guarantee that all duplicates have some copy in a home location.) This property holds at the outset (after caf's 1st loop finishes) and I show below that it's maintained after each step.

As we go through the array, success on the A[i] != i part of the test implies that A[i] could be a duplicate that hasn't been seen before. If we haven't seen it before, then we expect A[i]'s home location to point to itself -- that's what's tested for by the second half of the if condition. If that's the case, we print it and alter the home location to point back to this first found duplicate, creating a 2-step "cycle".

To see that this operation doesn't alter our invariant, suppose m = A[i] for a particular position i satisfying A[i] != i && A[A[i]] == A[i]. It's obvious that the change we make (A[A[i]] = i) will work to prevent other non-home occurrences of m from being output as duplicates by causing the 2nd half of their if conditions to fail, but will it work when i arrives at the home location, m? Yes it will, because now, even though at this new i we find that the 1st half of the if condition, i0, is true, the 2nd half tests whether the location it points to is a home location and finds that it isn't. In this situation we no longer know whether m or i2 was the duplicate value, but we know that either way, i6, because these 2-cycles are guaranteed not to appear in the result of caf's 1st loop. (Note that if i3 then exactly one of m and i2 occurs more than once, and the other does not occur at all.)

Not really pretty but at least it's easy to see the O(N) and O(1) properties. Basically we scan the array and, for each number we see if the corresponding position has been flagged already-seen-once (N) or already-seen-multiple-times (N+1). If it is flagged already-seen-once, we print it and flag it already-seen-multiple-times. If it is not flagged, we flag it already-seen-once and we move the original value of the corresponding index to the current position (flagging is a destructive operation).

for (i=0; i<a.length; i++) {
value = a[i];
if (value >= N)
continue;
if (a[value] == N)  {
a[value] = N+1;
print value;
} else if (a[value] < N) {
if (value > i)
a[i--] = a[value];
a[value] = N;
}
}

or, better yet (faster, despite the double loop):

for (i=0; i<a.length; i++) {
value = a[i];
while (value < N) {
if (a[value] == N)  {
a[value] = N+1;
print value;
value = N;
} else if (a[value] < N) {
newvalue = value > i ? a[value] : N;
a[value] = N;
value = newvalue;
}
}
}

Algorithm can be readily seen in the following C function. Retrieving original array, although not required, will be possible taking each entry modulo n.

void print_repeats(unsigned a[], unsigned n)
{
unsigned i, _2n = 2*n;
for(i = 0; i < n; ++i) if(a[a[i] % n] < _2n) a[a[i] % n] += n;
for(i = 0; i < n; ++i) if(a[i] >= _2n) printf("%u ", i);
putchar('\n');
}

Ideone Link for testing.

If the array is not too large this solution is simpler, It creates another array of same size for ticking.

1 Create a bitmap/array of same size as your input array

 int check_list[SIZE_OF_INPUT];
for(n elements in checklist)
check_list[i]=0;    //initialize to zero

2 scan your input array and increment its count in the above array

for(i=0;i<n;i++) // every element in input array
{
check_list[a[i]]++; //increment its count
}

3 Now scan the check_list array and print the duplicate either once or as many times they have been duplicated

for(i=0;i<n;i++)
{


if(check_list[i]>1) // appeared as duplicate
{
printf(" ",i);
}
}

Of course it takes twice the space consumed by solution given above ,but time efficiency is O(2n) which is basically O(n).

One solution in C is:

#include <stdio.h>


int finddup(int *arr,int len)
{
int i;
printf("Duplicate Elements ::");
for(i = 0; i < len; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else if(arr[abs(arr[i])] == 0)
{
arr[abs(arr[i])] = - len ;
}
else
printf("%d ", abs(arr[i]));
}


}
int main()
{
int arr1[]={0,1,1,2,2,0,2,0,0,5};
finddup(arr1,sizeof(arr1)/sizeof(arr1[0]));
return 0;
}

It is O(n) time and O(1) space complexity.

static void findrepeat()
{
int[] arr = new int[7] {0,2,1,0,0,4,4};


for (int i = 0; i < arr.Length; i++)
{
if (i != arr[i])
{
if (arr[i] == arr[arr[i]])
{
Console.WriteLine(arr[i] + "!!!");
}


int t = arr[i];
arr[i] = arr[arr[i]];
arr[t] = t;
}
}


for (int j = 0; j < arr.Length; j++)
{
Console.Write(arr[j] + " ");
}
Console.WriteLine();


for (int j = 0; j < arr.Length; j++)
{
if (j == arr[j])
{
arr[j] = 1;
}
else
{
arr[arr[j]]++;
arr[j] = 0;
}
}


for (int j = 0; j < arr.Length; j++)
{
Console.Write(arr[j] + " ");
}
Console.WriteLine();
}

Let's assume that we present this array as a uni-directional graph data structure - each number is a vertex and its index in the array points to another vertex forming an edge of the graph.

For even more simplicity we have indices 0 to n-1 and range of number from 0..n-1. e.g.

   0  1  2  3  4
a[3, 2, 4, 3, 1]

0(3) --> 3(3) is a cycle.

Answer: Just traverse the array relying on indices. if a[x] = a[y] then it's a cycle and thus duplicate. Skip to the next index and continue again and so forth until the end of of an array. Complexity: O(n) time and O(1) space.

private static void printRepeating(int arr[], int size) {
int i = 0;
int j = 1;
while (i < (size - 1)) {
if (arr[i] == arr[j]) {
System.out.println(arr[i] + " repeated at index " + j);
j = size;
}
j++;
if (j >= (size - 1)) {
i++;
j = i + 1;
}
}


}

Check out the explanation here https://youtu.be/qJ_Y7pKP0e4

code here https://github.com/TechieExpress/DataStructures/blob/main/findDuplicates

Code snippet:

/**
*
* @author techieExpress
*
* You are given a list of n-1 integers and these integers are in the range * of 1 to n.
* Input: Given an array of n elements which contains elements
* from 0 to n-1, with any of these numbers appearing any number of times.
*
* Goal: To find these repeating numbers in O(n) and using only constant  * * memory space.
**/


public class findDuplicates {
    

    

public static void main(String args[])
{
int arr[] = { 2,1,1,2 };
  

for (int i = 0; i < arr.length; i++) {
arr[arr[i] % arr.length]
= arr[arr[i] % arr.length]
+ arr.length;
}
System.out.println("The repeating elements are : ");
for (int i = 0; i < arr.length; i++) {
            

//System.out.print(numRay[i]);
if (arr[i] >= arr.length * 2) {
System.out.println(i + " ");
arr[i]=arr[i]%arr.length;
}
}
}


}

We can do it O(n) time and O(1) space complexity by -

  1. take the ith array element.

  2. Make it +ve if it's negative

  3. Last, multiply with -1 to the number getting from array index (ith element).

  4. If the number positive, return the index.

     def findDuplicate(self, arr: List[int]) -> int:
    n=len(arr)
    for i in range(0,n):
    
    
    arr[(abs(arr[i]))-1]=arr[(abs(arr[i]))-1]*(-1)
    if arr[(abs(arr[i]))-1]>0:
    return abs(arr[i])