Correct Prefix Property

When a parser successfully parses a given input, it can return a parse tree or a derivation as proof that the input is a word in the language. When the input is not a word in the language, we would like a detailed error message indicating why the input could not be parsed.

One form of such feedback is an indication of the position in the input at which the parser got stuck. More precisely, the parser returns the longest prefix of the input that is also a prefix of some word in the language. Such a prefix is correct in the sense that, if it is completed with the right ending, it can become a word in the language.

Formally, suppose that:

  • is the input to a parser,
  • can be decomposed as , where and are sequences of terminal symbols and is a terminal symbol,
  • there exists some sequence of terminal symbols such that is a word in the language,
  • there is no sequence y of terminal symbols such that is a word in the language.

In this case, is the longest correct prefix and a is informally the position of the bug. Then a parser has the correct prefix property if when parsing input , it reports an error at the position of a. The error report enables the user to locate the bug in the input.

Alternative Explanation

This is just reformulated, it might be a lot clearer.

Informally, a parser has the correct prefix property if it rejects the input as soon as it possibly can: the moment the parser finds a prefix of the input which cannot be extended to a valid input, it rejects the input.

Formally, suppose is the language to be parsed. Suppose the following statements are true:

  • The word is not in , where and .
  • The word is a prefix of a word in , that is, there exists such that .
  • The word is not a prefix of a word in , that is, there does not exist such that .

A parser has the correct prefix property if for all , the word is rejected as soon as the parser processes .

This is a useful property for a programming language parser because it enables better error reporting: the parser can tell the user the exact location at which the program “stops being valid”, which is often close to where the syntax error actually appears.