Longest common subsequence
Subsequence
Formally, given a sequence X = <x1, x2, ..., xm>, another sequence Z = <z1, z2, ..., zk> is a subsequence of X if there exists a strictly increasing sequence <i1, i2, ..., ik> of indices of X such that for all j = 1...k, we have x_i_j = z_i_j
For example, Z = <B, C, D, B> is a subsequence of X = <A, B, C, B, D, A, B> with corresponding index sequence <2, 3, 5, 7>.
Common subsequence
Given two sequences X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both.
LCM problem
We are given two sequences and wish to find a maximum length common subsequence.
Dynamic Programming Steps
Step 1: Characterizing a LCS
In a brute-force approach to solving LCS problem, we would enumerate all subsequences of X and check each subsequence to see whether it is also a subsequence of Y, keeping track of the longest subsequence we find. Because X has 2^m subsequences, this approach requires exponential time.
LCS problem has an optimal-substructure property. Given a sequence X = <x1...xm>, we define the ith prefix of X as Xi = <x1...xi>.
Theorem
Let Z = <z1...zk> be any LCS of X and Y.
- If
xm = yn, thenzk = xm = ym, andZk-1is an LCS ofxm-1andyn-1. - If
xm != yn, thenzk != xmimplies thatZis an LCS ofxm-1andY. - If
xm != yn, thenzk != ynimpllies thatZis an LCS ofXandYn-1.
Step 2: Recursive solution
Because of step 1 theorem, to find an LCS of X and Y, we may need to find the LCSs of X and Yn-1 and of Xm-1 and Y. But each of these subproblems has the subproblems of finding LCS of Xm-1 and Yn-1. Many other subproblems share subsubproblems, so we can see the overlapping-subproblems.
We establish the following recurrence:
c[i,j] = c[i,j] = 0 // if i = 0 or j = 0
c[i,j] = c[i-1, j-1] + 1 // if i,j > 0 and xi = yj
c[i,j] = max(c[i, j-1], c[i-1, j]) // if i,j > 0 and xi != yj
Step 3: Computing the length of an LCS
Since the LCS has only Theta(m*n) distinct sub-problems, we can use dynamic programming to compute the solutions bottom up and achieve this complexity.
Procedure LCS-LENGTH takes two sequences as inputs (X = <x1...xm> and Y = <y1...yn>). It stores c[i,j] values in a row-major order (first row of c from left to right, then second row, and so on). This procedure also maintains table b[1..m, 1..n] to help us construct an optimal solution by pointing to the table entry corresponding to the optimal subproblem solution chosen when computing c[i,j]. Finally, c[m,n] contains the length of an LCS of X and Y.
LCS-LENGTH(X, Y) // Theta(m*n)
m = X.length
n = Y.length
let b[1..m, 1..n] and c[0..m, 0..n] be new tables
for i = 1 to m
c[i, 0] = 0
for j = 0 to n
c[0, j] = 0 // end of initialization
for i = 1 to m
for j = 1 to n
if xi == yj
c[i, j] = c[1-1, j-1] + 1
b[i, j] = (i-1, j-1)
else if c[i-1, j] >= c[i, j-1]
c[i,j] = c[i-1, j]
b[i,j] = (i-1, j)
else
c[i, j] = c[i, j-1]
b[i,j] = (i, j-1)
return c and b
Step 4: Constructing an LCS
The b table return by LCS-LENGTH enables us to quickly construct an LCS. We simply begin at b[m,n] and trace through the table.
PRINT-LCS(b, X, i, j) // O(m + n)
if i == 0 or j == 0
return
if b[i,j] == (i-1, j-1)
PRINT-LCS(b, X, i-1, j-1)
print xi // xi == yi
else if b[i,j] == (i-1, j)
PRINT-LCS(b, X, i-1, j)
else
PRINT-LCS(b, X, i, j-1)