# Needleman-Wunsch Algorithm

May 20, 2023

The Needleman-Wunsch algorithm is a dynamic programming algorithm used in bioinformatics to align two sequences of nucleotides or amino acids. It is used to determine the optimal alignment of two sequences based on a given scoring system. It finds the alignment that yields the highest total score using a matrix-based approach.

The purpose of the Needleman-Wunsch algorithm is to determine the similarity between two sequences of nucleotides or amino acids. By aligning these sequences, researchers can compare their similarities and differences to determine evolutionary relationships, functional similarities, and other important information.

The algorithm is used extensively in bioinformatics for tasks such as sequence alignment, phylogenetic tree construction, and protein structure prediction. It is also used in other fields such as speech recognition and data compression.

## Brief History and Development

The Needleman-Wunsch algorithm was first published in 1970 by Saul Needleman and Christian Wunsch. It was the first algorithm to align sequences using dynamic programming, and it quickly became the standard method for sequence alignment.

Since its initial publication, the algorithm has been widely used and adapted for different types of sequences and scoring systems. Modifications have been made to the algorithm to accommodate gaps, insertions, and deletions in sequences.

## Key Concepts and Principles

The Needleman-Wunsch algorithm is based on a matrix-based approach, where each position in the matrix represents a possible alignment of two characters from each sequence. The algorithm works by filling out a matrix from left to right and top to bottom.

The matrix is filled out using a scoring system that assigns a score to each possible alignment of two characters. The score is determined by a set of rules that are specific to the type of sequences being aligned. For example, in DNA sequence alignment, a score of +1 may be given for a match and -1 for a mismatch.

The algorithm then determines the optimal alignment by tracing back through the matrix from the bottom right corner to the top left corner. This path represents the alignment that yields the highest total score.

## Pseudocode and Implementation Details

``````function NeedlemanWunsch(s1, s2, match, mismatch, gapPenalty):
//initialize matrix
initialize matrix M with dimensions (length(s1)+1) x (length(s2)+1)
for i = 0 to length(s1):
M[i] = i * gapPenalty
for j = 0 to length(s2):
M[j] = j * gapPenalty

//fill in matrix
for i = 1 to length(s1):
for j = 1 to length(s2):
scoreDiagonal = M[i-1][j-1] + (s1[i] == s2[j] ? match : mismatch)
scoreLeft = M[i][j-1] + gapPenalty
scoreUp = M[i-1][j] + gapPenalty
M[i][j] = max(scoreDiagonal, scoreLeft, scoreUp)

//determine optimal alignment
align1 = ""
align2 = ""
i = length(s1)
j = length(s2)
while i > 0 or j > 0:
if i > 0 and j > 0 and M[i][j] == M[i-1][j-1] + (s1[i] == s2[j] ? match : mismatch):
align1 = s1[i] + align1
align2 = s2[j] + align2
i -= 1
j -= 1
elif j > 0 and M[i][j] == M[i][j-1] + gapPenalty:
align1 = "-" + align1
align2 = s2[j] + align2
j -= 1
else:
align1 = s1[i] + align1
align2 = "-" + align2
i -= 1

return (align1, align2)
``````

## Examples and Use Cases

The Needleman-Wunsch algorithm can be used to align any two sequences of nucleotides or amino acids. For example, it can be used to align two DNA sequences to determine their homology or to align two protein sequences to determine their similarity.

In addition to sequence alignment, the algorithm can also be used for other tasks such as phylogenetic tree construction and protein structure prediction.