Longest common subsequence length of given two strings

A subsequence is a sequence formed by another sequence after removal of 0 or more items from it. The order that items appear won’t change in the subsequence. In strings, a subsequence string generates by removing 0 or more characters from the parent string.

A common subsequence is a subsequence generated from two string. Let’s see this with two example strings.

string1 = {'a','l','i','b','a','b','a'}
string2 = {'d','e','v','a','b','a','b','a'}

Here the longest common subsequence is {'a','b','a','b','a'} which is of size 5.
Two string with a common subsequence length 5.

This problem can be solved using below two different methods.

  1. Recursion
  2. Dynamic Programming

Recursion Approach

Let’s see how we can divide the problem to get the recursion formula.

  1. Lengths of any of the string are 0 then the longest common subsequence length is 0.
  2. Both the characters at index ‘i’ are the same, then the longest common subsequence is 1 plus the longest common subsequence of both the remaining substrings.
  3. Both the characters are not the same then the longest common subsequence is the maximum between the longest common subsequence of string one without current character and string two, and the longest common subsequence of string one and string two without current character.

Let’s formulate the above recursion steps and write the code for the recursion approach.

// Formula
LCS(s1, s2, l1, l2) = 0    // If l1 = 0 or l2 = 0
     = LCS(s1, s2, l1-1, l2-1) + 1 // if s1[l1-1] = s2[l2-1] (last characters are same).
     = max { LCS(s1, s2, l1-1, l2), LCS(s1, s2, l1, l2-1) } // For all other cases.
int longestCommonSubseqRec(char arr1[], char arr2[], int len1, int len2)
{
    if (len1 == 0 || len2 == 0)
    {
        return 0;
    }
    if (arr1[len1 - 1] == arr2[len2 - 1])
    {
        return 1 + longestCommonSubseqRec(arr1, arr2, len1 - 1, len2 - 1);
    }
    else
    {
        return max(longestCommonSubseqRec(arr1, arr2, len1 - 1, len2),
                   longestCommonSubseqRec(arr1, arr2, len1, len2 - 1));
    }
}

Complexity of this recursive approach is O(2^N) which is exponential. For example for two strings with lengths 2 and 3, in the worst case, the recursion tree would be as follows.

Worst case recursion tree for strings with lengths 2 & 3.

In the above diagram when you observe, F(1,1) subtree called three times and F(1,2) subtree called twice. So, this recursion approach is bad as it hits the performance by calling the same function exponentially with an increase in height. To overcome this problem we have a Dynamic Programming approach.

Dynamic Programming

To overcome the recursion overlaps, dynamic programming memorizes the results at each level and uses them whenever it needs again. In this way we can reduce all overlaps and the runtime will go up to O(M*N) where M and N are lengths of two strings. This complexity is way better than exponential recursion complexity.

Let’s write the formula for Dynamic Programming memorisation.

dp[i][j] = 0    // if i=0 or j=0
     = dp[i-1][j-1]  // if character at i in string1 = character at j in string2.
     = max { dp[i-1][j], dp[i][j-1] }   // for all other cases.

Let’s see the code for above dynamic programming formula.

int longestCommonSubseqDP(char arr1[], char arr2[], int len1, int len2)
{
    int dp[len1 + 1][len2 + 1];
    for (int i = 0; i <= len1; i++)
    {
        for (int j = 0; j <= len2; j++)
        {
            if (i == 0 || j == 0)
            {
                dp[i][j] = 0;
            }
            else
            {
                if (arr1[i - 1] == arr2[j - 1])
                {
                    dp[i][j] = 1 + dp[i - 1][j - 1];
                }
                else
                {
                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
    }
    return dp[len1][len2];
}

As discussed earlier, the complexity of this approach is O(M*N), where M and Ns are the lengths of the two given strings.

Let’s see complete code for this problem with main function.

// Output
Longest common subsequence using Recursion: 5
Longest common subsequence using Dynamic Programming: 5

These are the two different approaches to solve the longest common subsequence length problem. If you have any better approaches to solve this problem, please let our readers know by commenting here below. Happy Learning!.

See more:
Advertisements