Correct Prefix Property

When a parser successfully parses a given input, it can return a parse tree or a derivation as proof that the input is a word in the language. When the input is not a word in the language, we would like a detailed error message indicating why the input could not be parsed.

One form of such feedback is an indication of the position in the input at which the parser got stuck. More precisely, the parser returns the longest prefix of the input that is also a prefix of some word in the language. Such a prefix is correct in the sense that, if it is completed with the right ending, it can become a word in the language.

Formally, suppose that:

$w$ is the input to a parser,
$w$ can be decomposed as $w = x a z$ , where $x$ and $z$ are sequences of terminal symbols and $a$ is a terminal symbol,
there exists some sequence $y$ of terminal symbols such that $x y$ is a word in the language,
there is no sequence y of terminal symbols such that $x a y^{'}$ is a word in the language.

In this case, $x$ is the longest correct prefix and a is informally the position of the bug. Then a parser has the correct prefix property if when parsing input $w = x a z$ , it reports an error at the position of a. The error report enables the user to locate the bug in the input.

Alternative Explanation

This is just reformulated, it might be a lot clearer.

Informally, a parser has the correct prefix property if it rejects the input as soon as it possibly can: the moment the parser finds a prefix of the input which cannot be extended to a valid input, it rejects the input.

Formally, suppose $L$ is the language to be parsed. Suppose the following statements are true:

The word $w = x a z$ is not in $L$ , where $w, x, z \in Σ^{*}$ and $a \in Σ$ .
The word $x$ is a prefix of a word in $L$ , that is, there exists $y \in Σ^{*}$ such that $x y \in L$ .
The word $x a$ is not a prefix of a word in $L$ , that is, there does not exist $v \in Σ^{*}$ such that $x a v \in L$ .

A parser has the correct prefix property if for all $w = x a z$ , the word $x a z$ is rejected as soon as the parser processes $x a$ .

This is a useful property for a programming language parser because it enables better error reporting: the parser can tell the user the exact location at which the program “stops being valid”, which is often close to where the syntax error actually appears.

🛠️ Steven Gong

Table of Contents

Correct Prefix Property

Alternative Explanation

Graph View

Backlinks