Longest Common Subsequence
In the longest-common-subsequence problem, given
two sequences X = < x1, x2, . . , xm >
and Y = < y1, y2, . . . , yn > and wish
to find a maximum length common subsequence of X and Y.
Given two sequences X and Y , we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y .
For example, if X = < A, B,C, B,D,A,B > and Y = < B,D,C,A, B,A >, the sequence < B,C,A> is a common subsequence of both X and Y . The sequence < B,C,A > is not a longest common subsequence (LCS) of X and Y , however, since it has length 3 and the sequence < B,C, B,A >, which is also common to both X and Y , has length 4. The sequence < B, C, B, A> is an LCS of X and Y, as is the sequence < B, D, A, B>,since X and Y have no common subsequence of length 5 or greater.
Given two sequences X and Y , we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y .
For example, if X = < A, B,C, B,D,A,B > and Y = < B,D,C,A, B,A >, the sequence < B,C,A> is a common subsequence of both X and Y . The sequence < B,C,A > is not a longest common subsequence (LCS) of X and Y , however, since it has length 3 and the sequence < B,C, B,A >, which is also common to both X and Y , has length 4. The sequence < B, C, B, A> is an LCS of X and Y, as is the sequence < B, D, A, B>,since X and Y have no common subsequence of length 5 or greater.
Solving the
LCS problem using dynamic programming
Step 1:
Characterizing a longest common subsequence
The LCS problem has an optimal-substructure
property.
Theorem
(Optimal substructure of an LCS)
Let X = < x1, x2, . . . , xm >
and Y = <y1, y2, . . . , yn > be sequences, and let Z = < z1,z2,
. . . ,zk > be any LCS of X and Y .
1. If xm = yn, then zk = xm
= yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then zk ≠ xm
implies that Z is an LCS of Xm-1 and Y.
3. If xm ≠ yn, then zk ≠ yn
implies that Z is an LCS of X and Yn-1.
The way that Theorem characterizes longest common subsequences tells us
that an LCS of two sequences contains within it an LCS of prefixes of the two
sequences. Thus, the LCS problem has an optimal-substructure property.
Step 2: A
recursive solution
Finding an LCS of X =
< x1, x2, . . . , xm > and Y = < y1,
y2, . . . , yn >.
If xm = yn, we
must find an LCS of Xm-1 and Yn-1. Appending xm
= yn to this LCS yields an LCS of X and Y .
If xm ≠ yn,
then we must solve two subproblems: finding an LCS of Xm-1 and Y and
finding an LCS of X and Yn-1.
Whichever of these two LCSs is longer is an LCS of X and Y.
To find an LCS of X and
Y, we may need to find the LCSs of X and Yn-1 and of Xm-1
and Y. But each of these subproblems has the subsubproblem of finding an LCS of
Xm-1 and Yn-1.
The optimal substructure
of the LCS problem gives the recursive formula
C [i, j ] to be the length of an LCS of the sequences Xi and
Yj
Step 3:
Computing the length of an LCS
Based on equation, an exponential-time
recursive algorithm can be written to compute the length of an LCS of two sequences. Since the
LCS problem has only θ(mn) distinct subproblems, however,dynamic
programming can be used to compute the solutions bottom up.
Procedure LCS-LENGTH takes two sequences X = < x1, x2,
. . . , xm > and Y = < y1,y2, . . . ,yn
> as inputs. It stores the c[ i, j ] values in a table c [ 0 . . m, 0 . .
n].
The
procedure also maintains the table b [1 . . m, 1 . . n ] to help us construct
an optimal solution. The procedure returns the b and c tables; c [m, n]
contains the length of an LCS of X and Y.
Step 4:
Constructing an LCS
The
b table returned by LCS-LENGTH enables us
to quickly construct an LCS of X = < x1, x2, . . . ,
xm > and Y = < y1, y2, . . . , yn >. We simply
begin at b [m, n] and trace through the table by following
the arrows. Whenever
we encounter a
in entry b[i, j ], it implies that xi = yj is an element
of the LCS that LCS-LENGTH found.
No comments:
Post a Comment