Tuesday, September 2, 2014

Longest substring with equal number of 0s and 1s

Given a string consisting of only 0s and 1s, How do we write a program to find the longest sub string with equal number of 0s and 1s in it?

For example consider the string "01001", the longest string with equal number of 0s and 1s is "1001".

Let us look at the solution approaches.
Considering 0 as -1 and 1 as +1, at each index we maintain a cumulative "balance" value.
 
Here balance is the difference between the count of 1 and 0 appeared so far.

For example for the example string "01001", the balance values are as follows

Index:     0   1   2   3   4
Bit:       0   1   0   0   1
Balance:  -1   0  -1  -2  -1


In this balance array, the longest distance between any two equal values gives us the required result.
 
In the above example, balance value "-1" appear at index 0 and index 4. So 4-0 = 4 is the maximum length of the sub string with equal number of 0,1s.
and the sub string is 1001 ( sub string between 1 and 4 indices).

Let us look at two possible approaches which utilizes the above property.

Method 1: Simple O(n2) approach
 
This method iterates through all possible sub strings and finds out which one is the longest with equal number of 0, 1s.
To check if a sub string has equal number of 0,1, we check the balance value at each index. If it is zero, we will update the maximum length if required.

Method 2: Efficient O(n) approach
 
This method needs extra space to run in O(n) time. For a string of length n, the range of balance values are in the range (-n, n); -n when all entries are 0, and n when all entries are 1.

We will create an array of size 2n+1 which acts as a hash table for all possible balance values. This table stores the leftmost index in the array for each balance value.


Since the array is 0 indexed, table[n+b] stores the leftmost index with balance b. At each index we keep on updating the balance value.


  • If there is no entry in the table with the given balance, we store the index for b.
  • If the table contains a previous entry with the same balance value, we check if the sub string between this previous index and the current index is the longest sub string.
Below is the C++ program to which implements the two approaches discussed above.