The Longest Common Subsequence problem finds the longest sequence that appears in both strings in the same order, but not necessarily contiguously. For example, LCS of ABCDGH and AEDFHR is ADH with length 3. This is different from longest common substring which requires consecutive characters.
LCS exhibits optimal substructure. If the last characters of both strings match, the LCS length is 1 plus the LCS of the remaining strings. If they do not match, the LCS is the maximum of two possibilities: LCS excluding the last character of the first string, or LCS excluding the last character of the second string.
The DP solution uses a 2D table where dp of i of j represents the length of LCS of the first i characters of string 1 and the first j characters of string 2. The recurrence relation is: if s1 of i minus 1 equals s2 of j minus 1, then dp of i of j equals dp of i minus 1 of j minus 1 plus 1, meaning characters match so extend LCS. Otherwise, dp of i of j equals max of dp of i minus 1 of j and dp of i of j minus 1, meaning take best of excluding one character.
Base cases are dp of 0 of j equals 0 for all j, meaning empty string 1 has LCS length 0, and dp of i of 0 equals 0 for all i, meaning empty string 2 has LCS length 0. The final answer is dp of m of n where m and n are the string lengths.
Time complexity is O of m times n as we fill an m plus 1 by n plus 1 table with constant work per cell. Space complexity is O of m times n for the table, though it can be optimized to O of min of m comma n by keeping only the current and previous rows since we only look at adjacent cells.
To reconstruct the actual LCS sequence, not just its length, we backtrack through the DP table. Start at dp of m of n. If characters match, include this character and move diagonally to dp of i minus 1 of j minus 1. If characters do not match, move to the cell with larger value, either dp of i minus 1 of j or dp of i of j minus 1. Continue until reaching a cell with value 0.
Edit Distance, also called Levenshtein distance, is closely related to LCS. It finds the minimum number of operations insertions, deletions, or substitutions needed to transform one string into another. The relationship is: edit distance equals m plus n minus 2 times LCS length. This is because we need to delete characters from string 1 not in LCS and insert characters from string 2 not in LCS.
The Edit Distance DP is similar to LCS but tracks operation costs. dp of i of j represents minimum operations to transform first i characters of string 1 to first j characters of string 2. If characters match, dp of i of j equals dp of i minus 1 of j minus 1 with no operation needed. If they do not match, dp of i of j equals 1 plus min of dp of i minus 1 of j for deletion, dp of i of j minus 1 for insertion, and dp of i minus 1 of j minus 1 for substitution.
Applications of LCS include comparing DNA sequences in bioinformatics, finding similarities between text documents, version control systems to show file differences, and plagiarism detection. Edit Distance is used in spell checkers, DNA sequence analysis, speech recognition, and approximate string matching.
Example code
// Longest Common Subsequence
int longestCommonSubsequence(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m+1][n+1];
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1))
// Characters match, extend LCS
dp[i][j] = dp[i-1][j-1] + 1;
else
// Take max of two options
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
return dp[m][n];
}
// Time: O(m*n), Space: O(m*n)
// Reconstruct LCS
String getLCS(String s1, String s2, int[][] dp) {
StringBuilder lcs = new StringBuilder();
int i = s1.length(), j = s2.length();
while (i > 0 && j > 0) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
lcs.append(s1.charAt(i-1));
i--; j--;
} else if (dp[i-1][j] > dp[i][j-1])
i--;
else
j--;
}
return lcs.reverse().toString();
}
// Edit Distance
int editDistance(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m+1][n+1];
// Base cases
for (int i = 0; i <= m; i++) dp[i][0] = i;
for (int j = 0; j <= n; j++) dp[0][j] = j;
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1))
dp[i][j] = dp[i-1][j-1];
else
dp[i][j] = 1 + Math.min(
dp[i-1][j], // Delete
Math.min(
dp[i][j-1], // Insert
dp[i-1][j-1] // Replace
)
);
}
}
return dp[m][n];
}