Design and Analysis of Algorithms: Longest Common Subsequence

Longest Common Subsequence

In the longest-common-subsequence problem, given two sequences X = < x₁, x₂, . . , x_m > and Y = < y₁, y₂, . . . , y_n > and wish to find a maximum length common subsequence of X and Y.

Given two sequences X and Y , we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y .

For example, if X = < A, B,C, B,D,A,B > and Y = < B,D,C,A, B,A >, the sequence < B,C,A> is a common subsequence of both X and Y . The sequence < B,C,A > is not a longest common subsequence (LCS) of X and Y , however, since it has length 3 and the sequence < B,C, B,A >, which is also common to both X and Y , has length 4. The sequence < B, C, B, A> is an LCS of X and Y, as is the sequence < B, D, A, B>,since X and Y have no common subsequence of length 5 or greater.

Solving the LCS problem using dynamic programming

Step 1: Characterizing a longest common subsequence

The LCS problem has an optimal-substructure property.

Theorem (Optimal substructure of an LCS)

Let X = < x₁, x₂, . . . , x_m > and Y = <y1, y2, . . . , y_n > be sequences, and let Z = < z₁,z₂, . . . ,z_k> be any LCS of X and Y .

1. If x_m = y_n, then z_k = x_m = y_n and Z_k-1 is an LCS of X_m-1 and Y_n-1.

2. If x_m ≠ y_n, then z_k ≠ x_m implies that Z is an LCS of X_m-1 and Y.

3. If x_m ≠ y_n, then z_k ≠ y_n implies that Z is an LCS of X and Y_n-1.

The way that Theorem characterizes longest common subsequences tells us that an LCS of two sequences contains within it an LCS of prefixes of the two sequences. Thus, the LCS problem has an optimal-substructure property.

Step 2: A recursive solution

Finding an LCS of X = < x₁, x₂, . . . , x_m > and Y = < y₁, y₂, . . . , y_n >.

If x_m = y_n, we must find an LCS of X_m-1 and Y_n-1. Appending x_m = y_n to this LCS yields an LCS of X and Y .

If x_m ≠ y_n, then we must solve two subproblems: finding an LCS of X_m-1 and Y and finding an LCS of X and Y_n-1. Whichever of these two LCSs is longer is an LCS of X and Y.

To find an LCS of X and Y, we may need to find the LCSs of X and Y_n-1 and of X_m-1 and Y. But each of these subproblems has the subsubproblem of finding an LCS of X_m-1 and Y_n-1.

The optimal substructure of the LCS problem gives the recursive formula

C [i, j ] to be the length of an LCS of the sequences X_i and Y_j

Step 3: Computing the length of an LCS

Based on equation, an exponential-time recursive algorithm can be written to compute the length of an LCS of two sequences. Since the LCS problem has only θ(mn) distinct subproblems, however,dynamic programming can be used to compute the solutions bottom up.

Procedure LCS-LENGTH takes two sequences X = < x₁, x₂, . . . , x_m > and Y = < y₁,y₂, . . . ,y_n > as inputs. It stores the c[ i, j ] values in a table c [ 0 . . m, 0 . . n]. The procedure also maintains the table b [1 . . m, 1 . . n ] to help us construct an optimal solution. The procedure returns the b and c tables; c [m, n] contains the length of an LCS of X and Y.

Step 4: Constructing an LCS

The b table returned by LCS-LENGTH enables us to quickly construct an LCS of X = < x1, x2, . . . , xm > and Y = < y1, y2, . . . , yn >. We simply begin at b [m, n] and trace through the table by following the arrows. Whenever we encounter a

in entry b[i, j ], it implies that x_i = y_j is an element of the LCS that LCS-LENGTH found.

With this method, we encounter the elements of this LCS in reverse order. The following recursive procedure prints out an LCS of X and Y in the proper, forward order. The initial call is PRINT-LCS(b,X,X.length, Y.length).

Example:

Design and Analysis of Algorithms

Sunday, 24 November 2019

Longest Common Subsequence

Longest Common Subsequence

No comments:

Post a Comment

Updates