Coding Patterns: Longest Common Substring/Subsequence (DP)

8 minute read

In Coding Patterns series, we will try to recognize common patterns underlying behind each algorithm question, using real examples from Leetcode.

Previous posts were about Sliding Window, Two Pointers, Fast & Slow Pointers, Merge Intervals, Cyclic Sort, In-place Reversal of a Linked List, Breadth First Search (BFS), Depth First Search (DFS), Two Heaps, Subsets, Modified Binary Search, Top K Numbers, K-way Merge, 0/1 Knapsack, Topological Sort, Bitwise XOR, Staircase and Palindromes patterns and today, we will introduce Longest Common Substring / Subsequence pattern which is very useful to solve Dynamic Programming problems involving longest / shortest common strings, substrings, subsequences etc.

If you need to refresh your knowledge of Dynamic Programming, you may want to check it before diving into more advanced problems.

Problem: Longest Common Subsequence

LeetCode 1143 - Longest Common Subsequence [medium]

Given two strings text1 and text2, return the length of their longest common subsequence.

A common subsequence of two strings is a subsequence that is common to both strings. If there is no common subsequence, return 0.

Example 1:

Input: text1 = "abcde", text2 = "ace" 
Output: 3  
Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2:

Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" and its length is 3.

Example 3:

Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.

Constraints:

  • 1 <= text1.length <= 1000
  • 1 <= text2.length <= 1000
  • The input strings consist of lowercase English characters only.

Brute Force Solution

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters. (eg, “ace” is a subsequence of “abcde” while “aec” is not).

As a brute force solution, we can try all subsequences of text1 and text2 to find the longest one.

  • if the characters text1[i] matches text2[j], we can recursively match the others.
  • if the characters text1[i] and text2[j] does not match, we will make two recursive calls by skipping one character from each string.
class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        return self.longestCommonSubsequence_recursive(text1, text2, 0, 0)

    def longestCommonSubsequence_recursive(self, text1, text2, i, j):
        if i == len(text1) or j == len(text2):
            return 0
        if text1[i] == text2[j]:
            return 1 + self.longestCommonSubsequence_recursive(text1, text2, i + 1, j + 1)

        return max(self.longestCommonSubsequence_recursive(text1, text2, i + 1, j),
                   self.longestCommonSubsequence_recursive(text1, text2, i, j + 1))

Time Complexity: O(2N+M) where N and M are the lengths of two input strings.

Space Complexity: O(N + M) which is used to store the recursion stack.

Top-down Dynamic Programming with Memoization

We can use a two-dimensional array to store the already solved subproblems.

class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        memo = [[-1 for _ in range(len(text2))] for _ in range(len(text1))]
        return self.longestCommonSubsequence_recursive(memo, text1, text2, 0, 0)

    def longestCommonSubsequence_recursive(self, memo, text1, text2, i, j):
        if i == len(text1) or j == len(text2):
            return 0
        if memo[i][j] == -1:
            if text1[i] == text2[j]:
                memo[i][j] = 1 + self.longestCommonSubsequence_recursive(memo, text1, text2, i + 1, j + 1)
            else:
                memo[i][j] = max(self.longestCommonSubsequence_recursive(memo, text1, text2, i + 1, j),
                                 self.longestCommonSubsequence_recursive(memo, text1, text2, i, j + 1))
        return memo[i][j]

Time Complexity: O(N * M) where N and M are the lengths of two input strings.

Space Complexity: O(N * M)

Bottom-up Dynamic Programming with Tabulation

Lets create our two dimensional array in a bottom-up fashion.

  • if the characters text1[i] matches text2[j], the length of the common subsequence would be one plus the length of the common subsequence until the i-1 and j-1 indexes.
  • if the characters text1[i] and text2[j] does not match, we take the longest sequence by skipping one character either from ith string or jth character from respective strings.

Our overall algorithm is;

if text1[i] == text2[j]:
    memo[i][j] = 1 + memo[i - 1][j - 1]
else:
    memo[i][j] = max(memo[i - 1][j], memo[i][j - 1])

and the solution is;

class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        memo = [[0 for _ in range(len(text2) + 1)] for _ in range(len(text1) + 1)]
        max_length = 0
        for i in range(1, len(text1) + 1):
            for j in range(1, len(text2) + 1):
                if text1[i - 1] == text2[j - 1]:
                    memo[i][j] = 1 + memo[i - 1][j - 1]
                else:
                    memo[i][j] = max(memo[i - 1][j], memo[i][j - 1])

                max_length = max(max_length, memo[i][j])
        return max_length

Time Complexity: O(N * M) where N and M are the lengths of two input strings.

Space Complexity: O(N * M)

How to identify?

Longest Common Substring / Subsequence pattern is very useful to solve Dynamic Programming problems involving longest / shortest common strings, substrings, subsequences etc.

Similar LeetCode Problems