Saturday, June 29, 2013

How to find the frequecy of a word in a given sentence

Given some text and a word. How do we find the frequency of the word?

In this post we will see a Java program which finds it with just a few lines of code using regular expression matching.

Java provides utilities for matching the regular expressions using java.util.regex package. In our program we utilize two of them namely Pattern, and Matcher. These are the fundamental classes in this package.

In the following code reader is an object of class Scanner which is used for reading the input from the keyboard. The Pattern class does not have a public constructor, we have to create an object of that class using the compile() method by passing the pattern. Similarly the Matcher class also does not have a public constructor, and it's object should be returned by matcher() method of the Pattern object.

The matcher class provides a method called find() which tries to find the next instance of the pattern in the given text. If it finds the pattern it returns true otherwise it returns false.

Here is the Java code.


import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
    public static void main(String[] args)
    {
        Scanner reader = new Scanner(System.in);
        String sentence = reader.nextLine();
        String word = reader.nextLine();
        Pattern p = Pattern.compile(word);
        Matcher m = p.matcher(sentence);
        int count = 0;
        while( m.find())
            count++;
        System.out.println("The word '" + word + "' occurred " + count + " time(s)");
    }
}

Wednesday, June 26, 2013

How does a heapsort work

In selection sort, a list is sorted in the following way. Initially there will be an empty sorted list and the un-sorted list. We select the first element(According to the sort order) from the unsorted list and add it to the sorted list. We select the second element and append it to the sorted list and so on until the entire list is sorted.

Heapsort also works in the similar fashion. In heapsort, 'Heap' datastructure is used to maintain the unsorted list and hence the name heapsort. A heap is an array-based data structure visualized as a complete binary tree.
For example let us consider the following array and its corresponding binary tree form


Index  0  1   2  3  4  5  6
Array: 7  6   5  1  3  2  4  


                                                 7
                    /  \
                  6      5
                 / \    / \
                1   3  2   4


array[0] contains the root element, and array[2*0+1] contains the left child and array[2*0+2] contains the right child. In general ith node has its left and right children at 2*i+1 and 2*i+2 indices respectively and the parent of ith node will be at array[i/2].

The above heap is also a max-heap in which data at every node is greater than those of its children.

In this algorithm, all the elements are formed into a max-heap by repeatedly using heapify procedure. heapify procedure works on any node which violates the max-heap property. for example let us consider the following heap is not a max-heap because 2 is less than it's children.

                       2-----> violation
               / \
              6   5
             /\
            1  4

to correct this we will swap 2 and 6 ( bigger child) and recursively apply the same procedure for the node 2.
                               6
                    / \
  violation  <---- 2   5   
                  / \
                 1   4

So the following heap is a max-heap now.
         6
     / \
    4   5
   / \
  1   2

The build heap procedure scans through all the non-leaf nodes and calls the heapify procedure.

Finally the heap contains the maximum element at the root of the heap ( array[0] ). To sort the array. We swap the root element with the end element (array[length-1]). Now the heap may violate the max heap property at the root node. so call the heapify procedure. Next time we can exclude the last element because it is it's correct place. The heap size is reduced by one.

The following Java Program implements the above algorithm.



import java.util.Scanner;
public class Main {
    public static void main(String[] args)
    {
        Scanner reader = new Scanner(System.in);
        int N = Integer.parseInt(reader.nextLine());
        int []array = new int[N];
        int i;
        for( i = 0 ; i < N ; i++)
        {
            array[i] = reader.nextInt();
        }
        heapSort(array);
        //display the sorted array
        for( i = 0 ; i < N ; i++)
        {
            System.out.print(array[i] + " ");
        }
    }
    public static void heapSort(int[] array)
    {
        //build the max heap
        buildHeap( array );

        int i;
        for ( i = array.length-1 ; i > 0 ; i--)
        {
            //swap the max element to the end of the array
            int t = array[0];
            array[0] = array[i];
            array[i] = t;
            //heapify since max is removed
            heapify( array, 0, i);
        }

    }
    public static void buildHeap(int[] array)
    {
        if( array.length > 1)
        {
            int i;
            //heapify procedure is called only for half of the elems
            //because remaining are leaves and don't need to be adjusted
            for( i = array.length / 2 ; i >= 0 ; i--)
            {
                heapify(array, i, array.length);
            }
        }
    }
    public static void heapify(int[] array, int ind, int size)
    {
       if( ind >= size/2)
           return;

       int leftInd = 2*ind+1;
       int rightInd = 2*ind+2;
       int swapInd = leftInd;  //initialize swapInd to the left child
       //If right child is there, check if it needs to be swapped
       if( rightInd < size && array[leftInd] < array[rightInd])
           swapInd = rightInd;

       //check if swapping is required
       if( array[ind] < array[swapInd])
       {
           //swap parent and child
           int temp = array[ind];
           array[ind] = array[swapInd];
           array[swapInd] = temp;
           //recursive call on swapped index
           heapify(array,swapInd,size);
       }
    }
}

Friday, June 21, 2013

simple pattern matching

Pattern matching algorithms are very important in computer science. In this post we will discuss the simplest algorithm for pattern matching. Given some text and a pattern, we have to find where in the text, the pattern is found.

In this brute-force algorithm, we start matching the pattern from the beginning of the text. If a match is found we return the index. Otherwise we slide the matching window by one character and repeat the same process. For example if we are trying to find the pattern "ababb" in the text "abbabaababb"

The following depiction gives the steps involved in this algorithm

   a b b a b a a b a b b
   a b a
     a
       a
         a b a b
           a
             a b
               a b a b b -- pattern found


The following is the C++ implementation of the above algorithm. Note that this algorithm finds only the first occurrence of the pattern. This can be modified to find all the occurrences also.



#include <iostream>
#include <cstring>
#define MAXLEN 100
using namespace std;

int findPattern(char *t, char *p)
{
 int n = strlen(t); //length of the text
 int m = strlen(p); //length of the pattern
 int i,j; //loop counters
 
 //outer loop tries to match the pattern from index 0 to (n-m)
 for( i = 0 ; i <= n-m ; i++ )
 {
  //inner loop tries to match jth char in pattern
  //with (i+j)th char in text
  for( j = 0 ; j < m ; j++)
  {
   if( t[i+j] != p[j])
    break;
  }
  //pattern found; return the index
  if( j == m)
   return i;
 }
 //pattern not found
 return -1;
}

int main()
{
 char text[MAXLEN];
 char pattern[MAXLEN];

 cin>>text;
 cin>>pattern;
 cout<<"pattern found at: "<<findPattern(text,pattern);
 return 0;
}

Thursday, June 20, 2013

Finding the middle of the linked list

Given a single linked list, how do you find the middle of the list?

One simple method is to traverse through the list twice once to find it's length, and second to navigate to the middle of the list by traversing length/2 nodes. Even though this method runs in O(n) time, there is still a chance of improving this method. 

In this method we traverse the list only once. We take two pointers, one, a slow pointer which will advance one node at a time; and a fast pointer which will advance two nodes at a time. we iterate through the list until the fast pointer reaches the end of the list. At this stage, the slow pointer would be pointing to the middle of the linked list.

Following is the C++ function which implements this method. For working program please visit this post.




 SLLNode* getMiddle()
 {
  //if the list is empty or has a single node; return head
  if( head == NULL || head->getNext() == NULL )
   return head;
   
  SLLNode *slow = head;
  SLLNode *fast = head->getNext();
   
  while( fast != NULL )
  {    
   //move fast pointer two steps ahead
   if( fast->getNext() != NULL )
   {
    fast = fast->getNext()->getNext();
   }
   else
    break;
   //move slow pointer one step 
   slow = slow->getNext();
  }
  return slow;
 }

Tuesday, June 18, 2013

Print a string in reverse using recursion

In simple words, Recursion is nothing but a function calling itself. How can we solve this problem using recursion? 
Initially pass the entire string as input to the function. Basically a pointer to the first character is passed. This function examines the next character; if it is not null, it calls itself by advancing the pointer to the next character. This call sequence goes on until the next character is a null.
In the last call, it prints the last character and function call returns. When this call returns, it goes to the previous state. that means it prints the last but one character and so on...

Here the simple C++ code to do this.



#include <iostream>
#define MAXLEN 100
using namespace std;

void printRev(char *str)
{
 //return if empty string
 if( *str == NULL )
  return;
 
 //if the next char is not null, recurse
 if( *(str+1) != NULL )
  printRev(str+1);
  
 //print char while unwinding the stack
 cout<<*str;
}
int main()
{
 char str[MAXLEN];
 cin>>str;
 printRev(str);
 return 0;
}

Monday, June 17, 2013

Finding the element occurring odd number of times

Given an array of positive integers, all numbers occur even number of times, except one which occurs odd number of times. How do you find the number in O(n) time and O(1) space?

You can find a similar question in interviews which is infact a specific problem of the above category. An array contains 2n+1 positive integers where n integers appear twice, and one integer appear only once. Find that integer.



For example if the input array is [1,1,2,2,3] the output should be 3.

Whenever you encounter these kind of problems, think of XOR based solution. The following table shows the XOR logic of two bits


A  B   Result
--------------
0  0     0
0  1     1
1  0     1
1  1     0

This problem can easily be solved using the same logic. We all know that, performing XOR between the same bit pattern results in all Zeroes.
For example let us consider the bit pattern for 133


10000101
10000101
---------
00000000


Performing XOR between a bit pattern and all zeros gives the same bit pattern.
For example let us XOR 133 and 0


10000101
00000000
--------
10000101

And we also know that XOR is an associative and Commutative operation.
i.e A XOR (B XOR C) = (A XOR B) XOR C
    A XOR B = B XOR A


This means that if we XOR all the elements in the array, all the elements occurring even number of times nullify each other and only the number appearing odd number of times remains.

So the following simple program solves this problem.



#include <iostream>
#define MAXSIZE 100
int array[MAXSIZE];
using namespace std;
int findOddOccur(int n)
{
 int res = 0;
 for( int i = 0 ; i < n ;i++)
 {
  res = res ^ array[i];
 }
 return res;
}
int main()
{
 int n;
 cin>>n;
 for( int i = 0 ; i < n ; i++)
 {
  cin>>array[i];
 }
 cout<<"Odd occuring element is: "<<findOddOccur(n);
 return 0;
}

Tuesday, June 11, 2013

Finding the median of an array

Median of an array is defined as the middle element when it is sorted. The middle element is the one which divides the array into two equal halves if the array is of odd length. If the length is even, the average of two middle elements is median.

For example the median of
[4,3,2,5,1] is 3
Median of [6,1,3,5,4,2] is 3.5
The following is the Java code to calculate the median of an array.



import java.util.Arrays;
import java.util.Scanner;

/**
 *
 * @author Ravi
 */
public class Median {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        int nSize;
        int [] array;
        Scanner reader = new Scanner(System.in);
        nSize = reader.nextInt();
        array = new int[nSize];
        for( int i = 0 ; i < nSize; i++)
        {
            array[i] = reader.nextInt();
        }
        
        System.out.println("Median: "+getMedian(array));
    }
    public static double getMedian(int [] array)
    {
        Arrays.sort(array);
        if( array.length % 2 == 0)
        {
            int m1 = array[ (array.length/2)-1];
            int m2 = array[ array.length/2];
            return (m1+m2)/2.0;
        }
        else
        {
            return array[ array.length /2 ];
        }
    }
}

Monday, June 10, 2013

How does insertion sort work

Insertion sort is also a comparison based sorting. It is also an in-place sorting algorithm i.e it does not require extra space. We can re-arrange the elements within the array itself.The algorithm works as follows.

The array is divided into two parts one sorted and one un-sorted. Initially the sorted part will be empty and the unsorted part will be the entire array. We assume that the first element is sorted and start with the second element. we will insert that into the correct place in the sorted part. The same procedure is repeated for the remaining elements also until the entire array is sorted. In each iteration, when a new element is to be inserted into the sorted array, we need to shift the greater elements one step right to make space.

For example let us consider a simple array [3,1,5,2,4]. Following is the state of the array in each iteration. I have divided the array in two parts for convenience.


Step   |Sorted    | Unsorted
-----------------------------
1:     [3]        | [1,5,2,4]
2:     [1,3]      | [5,2,4]
3:     [1,3,5]    | [2,4]
4:     [1,2,3,5]  | [4]
5:     [1,2,3,4,5]| []


Following is the C++ code which implements insertion sort.



#include <iostream>
using namespace std;

void insertion_sort(int * array, int n)
{
 int i,j;
 //assume first element (index:0) is in correct position
 //start from the 2nd element (index:1)
 for( i = 1 ; i < n ; i++)
 {
  int toInsert = array[i];
  //find correct place for toInsert in arra[0:i-1] sub-array
  //j >= 0 ; make sure you have not gone beyond the beginning
  //array[j] > toInsert ; array elements greater than toInsert
  for( j = i-1 ; j >= 0 && array[j] > toInsert ; j--)
  {
   //shift array element one step right to make room for toInsert
   array[j + 1] = array[j];
  }
  //place toInsert in it's correct position
  array[j+1] = toInsert;  
 }
}
int main()
{
 int n;
 cin>>n;
 int *array = new int[n];
 int i;
 for( i = 0 ; i < n ; i++)
 {
  cin>>array[i];
 }
 insertion_sort(array,n);
 //print the sorted array
 for( i = 0 ; i < n ; i++)
 {
  cout<<array[i]<<" ";
 }
 delete[] array;
 return 0;
}

Thursday, June 6, 2013

How does binary search work

Given an array of numbers, how do you search for a number? 
The simplest method is to start at the first element and keep comparing until the end of the array. If it is found anywhere in the middle, return it. Otherwise we would reach the end of the array which indicates that the element in not found in the array. This is called linear search. Since this algorithm examines all the elements in the worst case, it's time complexity is O(n).

Binary search on the other hand runs in O(log n) which is better compared to linear search. Binary search requires the input array to be sorted

Binary search starts by dividing the array in two halves. Then we compare the search element with the middle element, if it matches, we return that index. If the middle element is less than the search element, then we can ignore the left half of the array because all the elements which precede middle are far away from the search element. Hence we can confine our search to the right half only. Similarly if the middle element is greater than the search element, no need to search the right half. We can confine our search to the left half only. The same procedure is repeated until we find the element or, the array size becomes 1.

Let us look at an example to understand this algorithm
Let us consider the input array as [1,2,3,4,5,6,7,8,9] and we want to search for 6. Here is how the algorithm works

step 1: [1,2,3,4,(5),6,7,8,9] - middle element is 5 which is less than 6. So skipping the left half
step 2: [6,(7),8,9] - middle element is 7 which is greater than 6. So skipping right half
step 3: [6] - middle element is 6. match found, so return it.


#include <iostream>
using namespace std;
int array[100];
//takes the following parameters 
//s: search element, lower index, higher index
int bin_search(int s, int low, int high)
{
 while( low <= high)
 {
  // the middle element
  int middle = (low + high)/2;
  //if match is found, return it
  if( array[middle] == s)
   return middle;
  
  //if middle element is less
  else if ( array[middle] < s)
  {
   //search the right half
   low = middle+1;
  }
  else
  {
   //search the left half
   high = middle-1;
  }
 }
    return -1;
}
int main()
{
 int n; // array size
 cin>>n;
 int i;
 for( i = 0 ; i < n ; i++)
 {
  cin>>array[i];
 }
 //search element
 int s;
 cin>>s;
 
 cout<<s<<" found at: "<<bin_search(s,0,n-1);
 return 0;
}

Wednesday, June 5, 2013

Program to find nth Fibonacci number

Fibonacci number in mathematics is defined as the sum of the two previous elements in the series. Formally this is represented as 
f(n) = f(n-1) + f(n-2) where f(1) = 1 and f(2) = 1.

In this post we will see how to generate nth fibonacci number. The algorithm is self-explanatory from the program itself. So jumping in to the code directly..



#include <iostream>
using namespace std;
int fib(int n)
{
  if( n < 3)
   return 1;
   
  int a = 1;
  int b = 1;
  int c = a+b;
  
 int i = 3;
  while ( i < n)
  {
   //store sum of two previous values in c
   c = a + b;
   a = b; //b is first and
   b = c; //c is second for next iteration 
   i++;
  }
  
  return c;
}
int main()
{
 int n;
  cin>>n;
  cout<<fib(n)<<" ";
  return 0;
}

Tuesday, June 4, 2013

Finding the first occurence of a number in a sorted array.


Let us consider an array of numbers sorted in ascending order which can possibly contain duplicates. How do we find the left most/first occurrence of a given number?
The moment we read searching in a sorted array binary search should come to our mind. But Binary search has a limitation that it just finds out if a particular element exists in the array. If array contains duplicate entries, binary search finds out any matching index. But we want to find out the left most occurrence.

For example let us consider an array [1,2,2,3,4] and we have to find the element 2. The binary search finds out 2 at the index 2(assuming the array index starts with 0) and returns. But we actually have to return index 1 because it is the first occurrence.

One brute-force method is to find the occurrence of the element and keep moving left until we reach the first occurrence. But this approach is not effective. Let us consider an example to understand this.

If the input array contains just one element repeated n times. [1,1,1,1,1]. We want to find out the first occurrence of 1. First we find 1 at the index 2, then we have to traverse n/2 places backwards in this worst case. So the time complexity is O(n) which is as good as linear search.

Then how can we modify the binary search to solve this problem with O(log n) time complexity?

Modify the binary search algorithm such that if a matching entry is found, instead of returning, compare the middle element with the previous element. If they do not match, we found the first occurrence. Otherwise search the left half for the first occurrence. This approach guarantees to take O(log n) time to find that first occurrence. Below is the implementation of this approach in C++.


#include<iostream>

using namespace std;
//modified binary search method to find the first occurence
//of an element in the sorted array
int findFirst(int *array, int d, int l, int h)
{ 
 while( l <= h)
 {
  int m = (l+h)/2;
  //if element is found
  if( array[m] == d )
  {
   //have we reached the beginning of the array
   //or middle and it's previous elements are equal?
   //search the left half
   if( m > 0 && array[m] == array[m-1])
   {
    h= m-1;
   }
   else
    return m; //we found the first occurence return it.      
  }
  //following two conditions are similar to the binary search
  else if ( array[m] < d)
  {
   l = m+1;
  }
  else
  {
   h = m-1;
  }
 }
 return -1;
}

int main()
{
 int n;
 cin>>n;
 int *array = new int[n];
 int i;
 for( i = 0 ; i < n ; i++)
  cin>>array[i];
 int x;
 cin>>x;
 cout<<x<<" found at "<<findFirst(array,x,0,n-1);
 delete[] array;
 return 0;
}

Monday, June 3, 2013

Program to shuffle an array

In some applications like card games, we need to shuffle an array of objects. In this post we will consider the problem of shuffling an array of numbers so that the same algorithm would be applicable for any type of arrays.

We use the random number generator in this algorithm. Let us consider an array of n numbers. It's element are in the range of 0 to n-1 indices. We first select a random number between 0 to n-1 indices,and swap the random element with the last element. In the next iteration, we exclude the last element (thus reducing the array size by 1) and repeat the same procedure.

Here is the Java code which implements this approach.




import java.util.Random;

/**
 *
 * @author Ravi
 */
public class ShuffleArray {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        int [] array = new int[10];
        int i;
        //fill the array with numbers 1-100
        for( i = 0 ; i < 10 ; i++)
        {
            array[i] = i+1;
        }
        //call shuflle array method
        shuffleArray(array);
        //print the shuffled array
        for( i = 0 ; i < 10 ; i++)
        {
            System.out.print(array[i]+" ");
        }
    }
    public static void shuffleArray(int[] array)
    {
        int i;
        //random number generator
        Random rd = new Random();
        //start from the end
        for( i = array.length-1 ; i > 0 ; i--)
        {
            //generates a random number between [0,i] inclusive
            int randomIndex = rd.nextInt(i+1);
            //swap the element at random index with ith element
            int temp = array[randomIndex];
            array[randomIndex] = array[i];
            array[i] = temp;
        }
    }
    
}

Sunday, June 2, 2013

How does a selection sort works

Like Bubble sort, Selection sort is also a comparison based sorting algorithm, but it uses lesser number of swappings. Selection sort works this way. In first iteration it selects the smallest element and keep it in the first place. In the second iteration it selects the next smallest element from the remaining elements and keep it in the second place, and so on.

For example let the input array be [3,1,2,5,4]. Here are the snapshots of the array
in each iteration
[3,1,2,5,4] - smallest element - 1
[1,3,2,5,4] - next smallest element - 2
[1,2,3,5,4] - next smallest element - 3 - no swapping required as it is already in correct position.
[1,2,3,5,4] - next smallest element - 4
[1,2,3,4,5] - Done!


Unlike the bubble sort, selection sort has no mechanism to detect if an array is already sorted. Even if the array is sorted, all iterations are to be completed to fully sort the array.

The following C++ function implements the above technique.
 
void selection_sort(int *array, int n)
{
 int i,j,smallInd;
 //outer loop runs for n-1 iterations
 for( i = 0 ; i < n-1 ; i++)
 {
  //assume array[i] is smallest
  smallInd = i;
  //search for smaller element from index i+1
  for( j = i+1; j < n ; j++ )
  {
   //update if smaller element is found
   if( array[j] < array[smallInd])
   {
    smallInd = j;
   }
  }
  //place the smallest element in its correct position
  if( i != smallInd) //dont swap the same elements
   swap(array[i],array[smallInd]);
 }
}

Saturday, June 1, 2013

Explaining recursion with a simple example

In this post we will see a simple example which explains the concept of recursion.
Recursion is nothing but defining a problem in terms of itself. For example let us consider the problem of finding the sum of all the elements in an array. We can define the problem as follows.

Let sum(array,n) denotes the sum of the elements of an array of size n.
We can define sum(array,n) = sum(array,n-1) + array[n]

sum(array,n-1) is sum of the elements of array of size n-1 which actually is the same problem but with a smaller input size - the smaller sub-problem. If we write a function to solve the original problem, we can use the same function to solve this sub-problem also. There should be one base-case where we no longer need recursion to solve the smallest problem possible. In this case, the smallest sub-problem would be an array with just one element. What is the sum of elements in such an array? That single element itself. Then we work backwards to solve increasingly bigger sub-problem.

The following program illustrates the above concept.


#include <iostream>
using namespace std;
//This is a recursive function to find the sum of
//the elements in an array
int findSumRecursive(int *array,int n)
{
 //Base cases
 //if the array is empty return 0
 if( n <= 0)
  return 0;
 //If the array has only one element, return that
 //It is first as well as last element
 else if( n == 1)
  return array[n-1];
 //recursive case
 else
  return array[n-1] + findSumRecursive(array,n-1);
}
int main()
{
 int n;
 cin>>n; //read array size
 int i;
 //dynamically allocate memory for the array
 int *array = new int[n];
 //read array elements
 for( i = 0 ; i < n ; i++)
 {
  cin>>array[i];
 }
 cout<<findSumRecursive(array,n)<<endl;
 //free the memory if it is no loner needed
 delete[] array;
 return 0;
}