# Longest Common Subsequence (LCS)

May 21, 2023

The Longest Common Subsequence (LCS) algorithm is a dynamic programming technique that is used to find the longest subsequence shared between two or more strings or sequences. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. The LCS algorithm is commonly used in computational biology, natural language processing, and data compression.

## History and Development

The LCS algorithm was first introduced by Michael O. Rabin and Richard M. Karp in 1957. In their paper “The Efficient Construction of an Unambiguous Algorithm for the Simultaneous Recognition of Two Languages,” they presented a dynamic programming approach to solve the LCS problem. Since then, the algorithm has been extensively studied and developed by various researchers and has become one of the most popular algorithms in computer science.

## Key Concepts and Principles

The key concept behind the LCS algorithm is dynamic programming. The algorithm relies on the fact that the LCS of two sequences can be derived from the LCS of their subsequences. To find the LCS of two sequences, the algorithm constructs a matrix with the lengths of the LCS of their subsequences. The matrix is filled using a recursive formula that takes into account the current element of the sequences and the length of the LCS of their previous subsequences.

The LCS algorithm uses the following principles:

• Dynamic programming: The algorithm breaks down a complex problem into smaller subproblems and builds up the solution from the subproblems.
• Optimal substructure: The optimal solution to the problem can be derived from the optimal solutions to its subproblems.
• Memoization: The algorithm stores the results of its computations to avoid redundant calculations.

## Pseudocode and Implementation Details

The following is the pseudocode for the LCS algorithm:

function LCS(X, Y):
m = length(X)
n = length(Y)
C = [[0]*(n+1) for _ in range(m+1)]
for i in range(m+1):
for j in range(n+1):
if i == 0 or j == 0:
C[i][j] = 0
elif X[i-1] == Y[j-1]:
C[i][j] = C[i-1][j-1] + 1
else:
C[i][j] = max(C[i-1][j], C[i][j-1])
return C[m][n]

The function LCS takes two sequences X and Y as input and returns the length of their LCS. The algorithm uses a two-dimensional array C to store the lengths of the LCS of their subsequences. The if statement checks if the current element of the sequences is the same or different. If it is the same, the length of the LCS of their previous subsequences is incremented by 1. If it is different, the LCS of the previous subsequences is compared, and the longest is selected.

The time complexity of the LCS algorithm is O(mn), where m and n are the lengths of the input sequences. The space complexity of the algorithm is also O(mn).

## Examples and Use Cases

### Example 1: Finding the LCS of two strings

Suppose we have two strings: “ABCDGH” and “AEDFHR”. The LCS of these strings is “ADH”. To find the LCS using the LCS algorithm, we apply the following steps:

1. Construct a matrix with the lengths of the LCS of their subsequences:
|   | A | E | D | F | H | R |
--|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
A | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
B | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
C | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
D | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
G | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
H | 0 | 1 | 1 | 2 | 2 | 3 | 3 |
1. Trace back the matrix to find the LCS:
|   | A | E | D | F | H | R |
--|---|---|---|---|---|---|---|
|   |   |   |   |   |   |   |
A |   | A | A | A | A | A | A |
B |   |   |   |   |   |   |   |
C |   |   |   |   |   |   |   |
D |   |   |   | D |   |   |   |
G |   |   |   |   |   |   |   |
H |   |   |   |   |   | H | H |

### Example 2: DNA sequence alignment

The LCS algorithm is commonly used in computational biology to align DNA sequences. Given two DNA sequences, the algorithm can find the longest common subsequence, which represents the most similar parts of the two sequences. This information can be used to identify mutations, genetic variations, and evolutionary relationships between organisms.

• The LCS algorithm is efficient and has a time complexity of O(mn), where m and n are the lengths of the input sequences.
• The algorithm is easy to implement and can be adapted to various problems.
• The algorithm can be extended to find the actual LCS, not just its length.