CYK Parser

Learned for CS241E. Wow it is actually used. This parser can parse Ambiguous Language, but the parse tree generated might not be the desired result.

For the notation, see Context-Free Language

parse(α, x) = // does α ⇒∗ x ? 
	if(α.isEmpty) { // Case 1 
		if (x.isEmpty) true 
		else false 
	} else if= aβ) { // Case 2 
		if(x = az && parse(β, z)) true 
		else false 
	} else if= A){ // Case 3 
		for each production A γ ∈ P { 
			if(parse(γ, x)) return true 
		} 
		false
	} else {// α = Aβ Case 4 
		for each way to split x into two substrings x = x1x2 { 
			if(parse(A, x1) && parse(β, x2)) return true 
		} 
		false 
	}

Complexity Analysis Space: Time:

  • calls to parse
  • per call to parse (in case 4)
  • total

Implementation, which returns a Parse Tree if a solution is found. It’s important to note that the CYK Parser uses Memoization so that it can also handle Ambiguous Grammar, and avoid entering infinite cycles.

  def parseCYK(grammar: Grammar, input: IndexedSeq[Token]): Option[Tree] = {  
    /** The memoization table: if the string of symbols ABC derives the substring of length `length`  
      * starting at position `from` of the `input`, then the entry for (Seq("A", "B", "C"), from, length)  
      * contains the three parse trees of A, B, and C. If a particular string of symbols      * does not derive a given substring of the `input`, the corresponding table entry is `None`.  
      */    val memo = mutable.Map[(Seq[String], Int, Int), Option[Seq[Tree]]]()  
  
    /** If the string of symbols `lhs` derives the substring of length `length`  
      * starting at position `from` of the `input`, returns a sequence of the parse trees for the  
      * symbols in `lhs`.  
      *      * If `lhs` does not derive this substring of the input, returns `None`.  
      */    def recur(lhs: List[String], from: Int, length: Int): Option[Seq[Tree]] = {  
      if (memo.contains((lhs, from, length))) {  
        memo((lhs, from, length))  
      } else {
        memo((lhs, from,length)) = None  
        if (lhs.isEmpty) { // Case 1  
//          println("Case #1",(lhs, from, length))  
          if (length == 0) memo((lhs, from, length)) = Some(Seq())  
          else memo((lhs, from, length)) = None  
          return memo((lhs, from, length))  
        } else if (grammar.terminals.contains(lhs(0))) { // Case 2  
//          println("Case #2",(lhs, from, length))  
          if (input(from).kind == lhs(0)) {  
            val temp = recur(lhs.tail, from+1, length-1)  
            if (!temp.isEmpty) {  
              memo((lhs, from, length)) = Some(Seq(new Tree(input(from))) ++ temp.get)  
            } else {  
              memo((lhs, from, length)) = None  
            }  
          } else {  
            memo((lhs, from, length)) = None  
          }  
          return memo((lhs, from, length))  
        } else if (grammar.nonTerminals.contains(lhs(0)) && lhs.length == 1) {  
//          println("Case #3",(lhs, from, length))  
          grammar.productionsExpanding(lhs(0)).foreach(gamma => {  
            val temp = recur(gamma.rhs.toList, from, length)  
            if (!temp.isEmpty) {  
              memo((lhs, from, length)) = Some(Seq(new Tree(Token(lhs(0)), temp.get)))  
              return memo((lhs, from, length))  
            }  
          })  
          memo((lhs, from, length)) = None  
          return memo((lhs, from, length))  
        } else {  
//          println("Case #4",(lhs, from, length))  
          for (i <- 0 until length) {  
            val temp1 = recur(List(lhs(0)), from, i)  
            val temp2 = recur(lhs.tail, from+i, length-i)  
            if (!temp1.isEmpty && !temp2.isEmpty) {  
              memo((lhs,from, length)) = Some(temp1.get ++ temp2.get)  
              return memo((lhs, from, length))  
            }  
          }  
          return memo((lhs, from, length))  
        }  
      }  
    }  
  
    recur(List(grammar.start), 0, input.size).map(_.head)  
  }