# CYK Parsing Algorithm

April 27, 2023

The CYK (Cocke-Younger-Kasami) Parsing Algorithm is a parsing algorithm used for context-free grammars. It is used to determine whether a given string can be generated by a given CFG (Context-Free Grammar). The algorithm is widely used in natural language processing and computational linguistics.

The CYK Parsing Algorithm is used to determine whether a given string of symbols can be generated by a given context-free grammar. In other words, it verifies if a string is a valid sentence in the language defined by the grammar. It can also be used to generate all possible parse trees for a given string.

In natural language processing, the CYK Algorithm is used for tasks such as part-of-speech tagging, syntactic parsing, and semantic analysis. It is also used in computational linguistics to generate parse trees for sentences in a natural language.

## Brief History and Development

The CYK Parsing Algorithm was developed independently by John Cocke, Daniel Younger, and Tadao Kasami in 1965. The algorithm is named after its inventors, Cocke-Younger-Kasami. The algorithm is based on the dynamic programming technique, which was first proposed by Richard Bellman in the 1950s.

## Key Concepts and Principles

The CYK Parsing Algorithm works by constructing a table of possible non-terminals that can generate a substring of the input string. The non-terminals are taken from the context-free grammar being used. The table is filled in a bottom-up manner, where each cell in the table represents a substring of the input string. The table is filled using the production rules of the context-free grammar.

The CYK Parsing Algorithm has two key principles:

1. Dynamic Programming: The algorithm uses dynamic programming to fill the table in a bottom-up manner. This ensures that each cell in the table is filled only once, and the algorithm runs efficiently.

2. Parsing by Recognition: The CYK Algorithm parses the input string by recognizing the substrings that can be generated by the context-free grammar. This means that it does not generate all possible parse trees for the input string, but rather only the valid ones.

## Pseudocode and Implementation Details

The following is the pseudocode for the CYK Parsing Algorithm:

``````Input: A context-free grammar G and a string w
Output: True if w can be generated by G, False otherwise

1. Construct a table T of size |w| x |w|
2. Fill the diagonal of the table with the non-terminals that generate the corresponding symbol in w
3. For each cell (i,j) in the table, fill it with the non-terminals that can generate the substring w[i:j]
a. For each k from i to j-1:
i. For each non-terminal A that generates w[i:k]:
1. For each non-terminal B that generates w[k:j]:
a. Add all non-terminals C such that C -> AB is a production in G to the cell (i,j) in the table
4. If the start symbol S of G is in the cell (1,|w|), return True, else return False
``````

The above pseudocode demonstrates the key elements of the CYK Parsing Algorithm. The algorithm constructs a table of size |w| x |w| and fills it in a bottom-up manner using the production rules of the context-free grammar. The algorithm then checks if the start symbol of the grammar is in the cell (1,|w|) of the table.

## Examples and Use Cases

Consider the following context-free grammar:

``````S -> AB | BC
A -> BA | a
B -> CC | b
C -> AB | a
``````

Let us use the CYK Parsing Algorithm to verify whether the string “baaba” can be generated by the above grammar.

1. Construct a table of size 5 x 5.
``````    | 1 | 2 | 3 | 4 | 5 |
--- | - | - | - | - | - |
1  |   |   |   |   |   |
2  |   |   |   |   |   |
3  |   |   |   |   |   |
4  |   |   |   |   |   |
5  |   |   |   |   |   |
``````
1. Fill the diagonal of the table with the non-terminals that generate the corresponding symbol in w.
``````    | 1 | 2 | 3 | 4 | 5 |
--- | - | - | - | - | - |
1  |   |   |   |   | C |
2  |   |   |   | A |   |
3  |   |   | B |   |   |
4  |   | C |   | A |   |
5  | A |   |   |   | B |
``````
1. For each cell (i,j) in the table, fill it with the non-terminals that can generate the substring w[i:j].
``````    | 1     | 2         | 3     | 4         | 5     |
--- | ----- | --------- | ----- | --------- | ----- |
1  |   C   | C -> AB    |   B   | A -> a    | B     |
2  |   A   | A -> a     | C     |           |       |
3  |   B   | C         |       |           |       |
4  |   C   | B         | A     | A -> a    |       |
5  | A     |           | B     | B -> CC   | C     |
``````
1. If the start symbol S of G is in the cell (1,|w|), return True, else return False.

The start symbol S of the grammar is not in the cell (1,5). Therefore, the string “baaba” cannot be generated by the given grammar.