# Z Algorithm

April 30, 2023

The Z Algorithm is a string matching algorithm used to find all occurrences of a pattern in a given text. It was invented by German computer scientist Gustav-Friedrich Hartmann in 1975 and is named after its main data structure, the Z-array.

## Purpose and Usage

The Z Algorithm is primarily used for pattern matching in strings. It is a linear time algorithm, which means it can find all occurrences of a pattern in linear time complexity O(n+m), where n is the length of the text and m is the length of the pattern. This makes it a more efficient algorithm than other commonly used string matching algorithms such as the Naive algorithm, which has a time complexity of O(n*m).

## Brief History and Development

The Z Algorithm was first introduced by Gustav-Friedrich Hartmann in a paper titled “On string matching” in 1975. It was later refined and popularized by Martín Escardó and Reinhard Wilhelm in 1992, who presented a more efficient implementation of the algorithm.

## Key Concepts and Principles

The Z Algorithm works by constructing a Z-array, which is an array of integers that stores the length of the longest substring starting from each position in the text that matches the pattern. The first element of the Z-array is always zero, and the rest of the elements are calculated using the following algorithm:

``````z = 0
for i from 1 to n-1:
if i > r:
l = r = i
while r < n and s[r-l] = s[r]:
r++
z[i] = r-l
r--
else:
k = i-l
if z[k] < r-i+1:
z[i] = z[k]
else:
l = i
while r < n and s[r-l] = s[r]:
r++
z[i] = r-l
r--``````

Here, `s` is the concatenated string of the pattern and the text, `n` is the length of `s`, `l` and `r` are variables that keep track of the boundaries of the current Z-box, and `z` is the Z-array. The algorithm scans each character in the string once and uses the information from the previous scans to avoid unnecessary comparisons. The Z-array can be used to find all occurrences of the pattern in the text by searching for all indices `i` such that `z[i] = m`, where `m` is the length of the pattern.

## Pseudocode and Implementation Details

The pseudocode for the Z Algorithm is given above. The implementation details depend on the programming language being used. Here is an implementation of the Z Algorithm in Python:

``````def z_algorithm(s):
n = len(s)
z =  * n
l, r = 0, 0
for i in range(1, n):
if i > r:
l = r = i
while r < n and s[r-l] == s[r]:
r += 1
z[i] = r-l
r -= 1
else:
k = i-l
if z[k] < r-i+1:
z[i] = z[k]
else:
l = i
while r < n and s[r-l] == s[r]:
r += 1
z[i] = r-l
r -= 1
return z``````

This implementation takes a string `s` as input and returns its Z-array.

## Examples and Use Cases

Here are some examples and use cases of the Z Algorithm:

### Example 1: Finding all occurrences of a pattern in a text

Suppose we want to find all occurrences of the pattern `ab` in the text `abababbaba`. We concatenate the pattern and the text with a special character `\$`, giving us the string `ab\$abababbaba`. We then apply the Z Algorithm to this string to get the Z-array `[0,0,1,0,3,0,1,0,0,1,0]`. The indices where the Z-array is equal to the length of the pattern, which is 2, are 4 and 10. Therefore, the pattern occurs at indices 2, 6, and 9 in the text.

### Example 2: Finding the longest repeated substring

Suppose we want to find the longest repeated substring in the string `ababababa`. We concatenate the string with itself with a special character `\$`, giving us the string `ababababa\$ababababa`. We then apply the Z Algorithm to this string to get the Z-array `[0,0,1,0,3,0,1,0,7,0,1,0,3,0,1,0]`. The highest value in the Z-array is 7, which corresponds to the longest repeated substring `ababa`.

### Use Case: DNA sequencing

The Z Algorithm can be used in DNA sequencing to find matches between a DNA sequence and a reference genome. It is particularly useful for finding short matches between the sequence and the genome, which can be difficult to detect using other algorithms.