Dynamic Programming in Algorithms

Longest Common Subsequence (LCS) Algorithm

Jan, 2025
Table of Contents
Contribute
5 min read
@usefulcodes
🥇

What is the Longest Common Subsequence?
Recursive Approach to LCS
Dynamic Programming Approach to LCS
Memoization in LCS
Tabulation in LCS
Time Complexity of LCS Algorithm
Space Complexity of LCS Algorithm
Summary

You can get training on this article to understand and master one of the most fundamental algorithms in dynamic programming: the Longest Common Subsequence (LCS). Whether you're preparing for technical interviews, solving competitive programming challenges, or exploring algorithmic problem-solving techniques, the LCS algorithm is a must-know concept. In this article, we will explore LCS in depth, examine its recursive and dynamic programming approaches, and discuss its computational complexities.

What is the Longest Common Subsequence?

The Longest Common Subsequence (LCS) is a classic problem in computer science that involves finding the longest sequence of characters that appear in the same relative order in two given strings. Unlike substrings, the characters of a subsequence do not need to be contiguous in the original strings.

For example, consider the strings X = "AGGTAB" and Y = "GXTXAYB". The LCS of these strings is "GTAB", which has a length of 4. The problem has widespread applications in areas such as version control systems (e.g., Git), computational biology (e.g., DNA sequence alignment), and text comparison utilities.

The challenge lies in determining this subsequence efficiently, particularly when the input strings are large. The LCS problem can be solved using two main approaches: a recursive approach and a dynamic programming approach. Let’s explore them in detail.

Recursive Approach to LCS

The recursive approach to solving LCS is intuitive but computationally expensive. The idea is to compare the last characters of the two strings and make decisions based on the following cases:

If the characters match: Add 1 to the result and recursively compute the LCS for the remaining prefixes of both strings.
If the characters do not match: Compute the LCS for two cases: Exclude the current character from the first string.Exclude the current character from the second string.Take the maximum result of these two recursive calls.
Exclude the current character from the first string.
Exclude the current character from the second string.
Take the maximum result of these two recursive calls.

Here’s the recursive function for LCS:

def lcs_recursive(X, Y, m, n):
    # Base case: If either string is empty, LCS is 0
    if m == 0 or n == 0:
        return 0
    # Case 1: Characters match
    if X[m - 1] == Y[n - 1]:
        return 1 + lcs_recursive(X, Y, m - 1, n - 1)
    # Case 2: Characters do not match
    else:
        return max(lcs_recursive(X, Y, m - 1, n), lcs_recursive(X, Y, m, n - 1))

While this method is conceptually simple, it has an exponential time complexity of O(2^N) due to overlapping subproblems. This inefficiency can be mitigated using dynamic programming techniques.

Dynamic Programming Approach to LCS

Dynamic programming transforms the recursive solution into an efficient algorithm by eliminating redundant computations. Instead of recomputing LCS values for the same substrings repeatedly, we store the results in a table and reuse them as needed.

The fundamental idea is to build a 2D array dp where dp[i][j] represents the length of the LCS of the first i characters of string X and the first j characters of string Y. The solution is constructed iteratively using the following recurrence relations:

If X[i-1] == Y[j-1], then dp[i][j] = dp[i-1][j-1] + 1.
Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).

Let’s write the dynamic programming implementation:

def lcs_dp(X, Y):
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[m][n]

This approach ensures optimal performance by filling the table in a bottom-up manner.

Memoization in LCS

Memoization combines the recursive approach with dynamic programming by storing intermediate results in a cache to avoid redundant calculations. It is a top-down approach where results are computed as needed and saved for future use.

Here’s how memoization can be implemented for LCS:

def lcs_memo(X, Y, m, n, memo):
    if m == 0 or n == 0:
        return 0
    if memo[m][n] is not None:
        return memo[m][n]
    if X[m - 1] == Y[n - 1]:
        memo[m][n] = 1 + lcs_memo(X, Y, m - 1, n - 1, memo)
    else:
        memo[m][n] = max(lcs_memo(X, Y, m - 1, n, memo), lcs_memo(X, Y, m, n - 1, memo))
    return memo[m][n]

# Initialize memo table and call the function
m, n = len(X), len(Y)
memo = [[None] * (n + 1) for _ in range(m + 1)]
result = lcs_memo(X, Y, m, n, memo)

Memoization reduces time complexity to O(m * n) while retaining the recursive structure.

Tabulation in LCS

Tabulation is the bottom-up counterpart of memoization. Instead of solving subproblems on demand, it precomputes all solutions from smaller subproblems to larger ones, as shown in the dynamic programming example earlier. This method is often preferred because it avoids recursion-related overhead and simplifies debugging.

Time Complexity of LCS Algorithm

The time complexity of the LCS algorithm depends on the approach used:

Recursive Approach: O(2^N) due to overlapping subproblems.
Dynamic Programming (Memoization/Tabulation): O(m * n) where m and n are the lengths of the input strings.

The dynamic programming approach ensures polynomial time complexity, making it significantly faster for large inputs.

Space Complexity of LCS Algorithm

The space complexity varies based on the implementation:

Recursive Approach: O(m + n) for the recursion stack.
Memoization: O(m * n) for the memoization table.
Tabulation: O(m * n) for the DP table.
Space-Optimized Tabulation: The space complexity can be reduced to O(min(m, n)) by maintaining only the current and previous rows of the DP table instead of the entire table.

Here’s an example of space-optimized tabulation:

def lcs_space_optimized(X, Y):
    m, n = len(X), len(Y)
    prev = [0] * (n + 1)

    for i in range(1, m + 1):
        curr = [0] * (n + 1)
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                curr[j] = prev[j - 1] + 1
            else:
                curr[j] = max(prev[j], curr[j - 1])
        prev = curr
    return prev[n]

Summary

The Longest Common Subsequence (LCS) problem is a cornerstone of dynamic programming, with applications in text comparison, sequence alignment, and beyond. We explored the recursive approach, dynamic programming (both memoization and tabulation), and space-optimized solutions. While the recursive solution is simple, dynamic programming ensures efficiency by reducing time complexity to O(m * n).

By mastering LCS, developers gain not only an essential tool for problem-solving but also a broader understanding of how to optimize algorithms using dynamic programming principles. For further exploration, refer to official resources such as GeeksforGeeks or algorithm textbooks like "Introduction to Algorithms" by Cormen et al.

Last Update: 25 Jan, 2025

Coin Change Problem

Knapsack Problem