Monday, May 6, 2013

Finding duplicates from given set of numbers

Given a list of numbers. How do we write a program to find the duplicates among them?
A simple algorithm is to compare each possible pair in the list to see if there are any duplicates. But this algorithm runs in O(n2) time.
We can use the following simple algorithm using the set data structure. This data structure supports insert/add, find operations. We read one number at a time and try to find it in the set. If the set already contains that element, we declare that element as a duplicate, otherwise we insert that element into that set. The following is a Java implementation of the same.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import java.io.*;
import java.util.HashSet;
/**
 *
 * @author Ravi
 */
public class DuplicateNums {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try
        {
            //Read from the standard input
            BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
            String input = br.readLine();
            String [] strNums = input.split(" ");
            
            //create the hash set to store the unique numbers
            HashSet<Integer> hSet = new HashSet<Integer>();
            
            for( int i = 0 ; i < strNums.length ; i++)
            {
                //try to add the number to the set, 
                //if add method returns false, the number is a duplicate
                if( !hSet.add( Integer.parseInt(strNums[i]) ) )
                {
                    System.out.print(strNums[i] + " ");
                }
            }
            System.out.println();
        }
        catch(IOException ioe)
        {
            System.out.println("Exception while reading the input" + ioe.getMessage());
        }
        catch(NumberFormatException nfe)
        {
            System.out.println("Invalid Input "+ nfe.getMessage());
        }
    }
}

We have used HashSet data structure from the Java standard library. This provides O(1) time for insert and find operations in an average case. So the average time complexity of the above program is O(n). Since we are storing the elements in the set, the space complexity is O(n) in the worst case (No duplicates in the list).