CYK Parsing Algorithm
April 27, 2023
The CYK (CockeYoungerKasami) Parsing Algorithm is a parsing algorithm used for contextfree grammars. It is used to determine whether a given string can be generated by a given CFG (ContextFree Grammar). The algorithm is widely used in natural language processing and computational linguistics.
The CYK Parsing Algorithm is used to determine whether a given string of symbols can be generated by a given contextfree grammar. In other words, it verifies if a string is a valid sentence in the language defined by the grammar. It can also be used to generate all possible parse trees for a given string.
In natural language processing, the CYK Algorithm is used for tasks such as partofspeech tagging, syntactic parsing, and semantic analysis. It is also used in computational linguistics to generate parse trees for sentences in a natural language.
Brief History and Development
The CYK Parsing Algorithm was developed independently by John Cocke, Daniel Younger, and Tadao Kasami in 1965. The algorithm is named after its inventors, CockeYoungerKasami. The algorithm is based on the dynamic programming technique, which was first proposed by Richard Bellman in the 1950s.
Key Concepts and Principles
The CYK Parsing Algorithm works by constructing a table of possible nonterminals that can generate a substring of the input string. The nonterminals are taken from the contextfree grammar being used. The table is filled in a bottomup manner, where each cell in the table represents a substring of the input string. The table is filled using the production rules of the contextfree grammar.
The CYK Parsing Algorithm has two key principles:

Dynamic Programming: The algorithm uses dynamic programming to fill the table in a bottomup manner. This ensures that each cell in the table is filled only once, and the algorithm runs efficiently.

Parsing by Recognition: The CYK Algorithm parses the input string by recognizing the substrings that can be generated by the contextfree grammar. This means that it does not generate all possible parse trees for the input string, but rather only the valid ones.
Pseudocode and Implementation Details
The following is the pseudocode for the CYK Parsing Algorithm:
Input: A contextfree grammar G and a string w
Output: True if w can be generated by G, False otherwise
1. Construct a table T of size w x w
2. Fill the diagonal of the table with the nonterminals that generate the corresponding symbol in w
3. For each cell (i,j) in the table, fill it with the nonterminals that can generate the substring w[i:j]
a. For each k from i to j1:
i. For each nonterminal A that generates w[i:k]:
1. For each nonterminal B that generates w[k:j]:
a. Add all nonterminals C such that C > AB is a production in G to the cell (i,j) in the table
4. If the start symbol S of G is in the cell (1,w), return True, else return False
The above pseudocode demonstrates the key elements of the CYK Parsing Algorithm. The algorithm constructs a table of size w x w and fills it in a bottomup manner using the production rules of the contextfree grammar. The algorithm then checks if the start symbol of the grammar is in the cell (1,w) of the table.
Examples and Use Cases
Consider the following contextfree grammar:
S > AB  BC
A > BA  a
B > CC  b
C > AB  a
Let us use the CYK Parsing Algorithm to verify whether the string “baaba” can be generated by the above grammar.
 Construct a table of size 5 x 5.
 1  2  3  4  5 
           
1      
2      
3      
4      
5      
 Fill the diagonal of the table with the nonterminals that generate the corresponding symbol in w.
 1  2  3  4  5 
           
1      C 
2     A  
3    B   
4   C   A  
5  A     B 
 For each cell (i,j) in the table, fill it with the nonterminals that can generate the substring w[i:j].
 1  2  3  4  5 
           
1  C  C > AB  B  A > a  B 
2  A  A > a  C   
3  B  C    
4  C  B  A  A > a  
5  A   B  B > CC  C 
 If the start symbol S of G is in the cell (1,w), return True, else return False.
The start symbol S of the grammar is not in the cell (1,5). Therefore, the string “baaba” cannot be generated by the given grammar.
Advantages and Disadvantages
Advantages
 The CYK Parsing Algorithm can handle any contextfree grammar.
 The algorithm is relatively fast and efficient.
 The algorithm can be used to generate all possible parse trees for a given string.
Disadvantages
 The algorithm requires the grammar to be in Chomsky Normal Form (CNF) for efficient operation.
 The algorithm may generate a large number of parse trees for ambiguous grammars, which can lead to performance issues.
Related Algorithms or Variations
 Earley Parsing Algorithm: A parsing algorithm that can handle arbitrary contextfree grammars in linear time.
 LL Parsing Algorithm: A topdown parsing algorithm that uses lefttoright scanning of the input string.
 LR Parsing Algorithm: A bottomup parsing algorithm that uses righttoleft reduction of the input string.