## Parsing Expression Grammars: A Recognition-Based Syntactic Foundation (2004)

### Cached

### Download Links

- [www.brynosaurus.com]
- [www.pdos.lcs.mit.edu]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | Symposium on Principles of Programming Languages |

Citations: | 75 - 1 self |

### BibTeX

@INPROCEEDINGS{Ford04parsingexpression,

author = {Bryan Ford},

title = {Parsing Expression Grammars: A Recognition-Based Syntactic Foundation},

booktitle = {Symposium on Principles of Programming Languages},

year = {2004},

pages = {111--122},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machineoriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power.

### Citations

1496 | The C++ Programming Language
- Stroustrup
- 1997
(Show Context)
Citation Context ...++ contains ambiguities that cannot be resolved with any amount of CFG rewriting, in which certain token sequences can be interpreted as either a statement or a definition. The language specification =-=[25]-=- resolves this problem with the informal meta-rule that such a sequence is always interpreted as a definition if possible. Similarly, the syntax of lambda abstractions, let expressions, and conditiona... |

632 | Synchronous tree adjoining grammars
- Shieber, Schabes
- 1990
(Show Context)
Citation Context ...sing semantic predicates [17]. Many extensions and variations of context-free grammars have been developed, such as indexed grammars [2], W-grammars [28], affix grammars [13], tree-adjoining grammars =-=[12]-=-, minimalist grammars [24], and conjunctive grammars [18]. Most of these extensions are motivated by the requirements of expressing natural languages, and all are at least as difficult to parse as CFG... |

137 | The syntax definition formalism sdf reference manual
- Heering, Hendriks, et al.
- 1989
(Show Context)
Citation Context ...creases the expressiveness of CFGs with explicit disambiguation rules, and supports unified language descriptions by combining lexical and context-free syntax definitions into a "two-level" =-=formalism [10]. The nond-=-eterministic linear-time NSLR(1) parsing algorithm [26] is powerful enough to generate "scannerless" parsers from unified syntax definitions without treating lexical analysis separately [22]... |

74 | Disambiguation Filters for Scannerless Generalized LR Parsers
- BRAND, SCHEERDER, et al.
- 2002
(Show Context)
Citation Context ...s use CFGs extended with explicit disambiguation rules to express both lexical and hierarchical syntax, supporting unified syntax definitions more cleanly while giving up strictly linear-time parsing =-=[21, 29, 27]. These sy-=-stems graft recognition-based functionality onto generative CFGs, resulting in a "hybrid" generative/recognition-based syntactic model. PEGs provide similar features in a simpler syntactic f... |

72 |
What can be do about the unnecessary diversity of notation for syntactic definitions
- Wirth
- 1977
(Show Context)
Citation Context ...sed formal foundation for language syntax, Parsing Expression Grammars or PEGs. PEGs are stylistically similar to CFGs with RE-like features added, much like Extended Backus-Naur Form (EBNF) notation =-=[30, 19]-=-. A key difference is that in place of the unordered choice operator `|' used to indicate alternative expansions for a nonterminal in EBNF, PEGs use a prioritized choice operator `/'. This operator li... |

60 | Conjunctive grammars
- Okhotin
- 2001
(Show Context)
Citation Context ...ions of context-free grammars have been developed, such as indexed grammars [2], W-grammars [28], affix grammars [13], tree-adjoining grammars [12], minimalist grammars [24], and conjunctive grammars =-=[18]-=-. Most of these extensions are motivated by the requirements of expressing natural languages, and all are at least as difficult to parse as CFGs. Since machine-oriented language translators often need... |

59 | Packrat parsing: Simple, powerful, lazy, linear time - Ford - 2002 |

43 |
Revised report on the algorithmic language ALGOL 68
- Wijngaarden, Mailloux, et al.
- 1975
(Show Context)
Citation Context ...ractical parsing systems such as ANTLR and JavaCC using semantic predicates [17]. Many extensions and variations of context-free grammars have been developed, such as indexed grammars [2], W-grammars =-=[28]-=-, affix grammars [13], tree-adjoining grammars [12], minimalist grammars [24], and conjunctive grammars [18]. Most of these extensions are motivated by the requirements of expressing natural languages... |

37 |
Scannerless NSLR(1) parsing of programming languages
- Salomon, Cormack
- 1989
(Show Context)
Citation Context ...[10]. The nondeterministic linear-time NSLR(1) parsing algorithm [26] is powerful enough to generate "scannerless" parsers from unified syntax definitions without treating lexical analysis s=-=eparately [22]-=-, but the algorithm severely restricts the form in which such CFGs can be written. Other machine-oriented syntax formalisms and tools use CFGs extended with explicit disambiguation rules to express bo... |

33 |
Indexed grammars-an extension of context-free grammars
- Aho
- 1968
(Show Context)
Citation Context ...be achieved in practical parsing systems such as ANTLR and JavaCC using semantic predicates [17]. Many extensions and variations of context-free grammars have been developed, such as indexed grammars =-=[2]-=-, W-grammars [28], affix grammars [13], tree-adjoining grammars [12], minimalist grammars [24], and conjunctive grammars [18]. Most of these extensions are motivated by the requirements of expressing ... |

26 |
Affix grammars
- Koster
- 1971
(Show Context)
Citation Context ...ems such as ANTLR and JavaCC using semantic predicates [17]. Many extensions and variations of context-free grammars have been developed, such as indexed grammars [2], W-grammars [28], affix grammars =-=[13]-=-, tree-adjoining grammars [12], minimalist grammars [24], and conjunctive grammars [18]. Most of these extensions are motivated by the requirements of expressing natural languages, and all are at leas... |

23 | Adding Semantic and Syntactic Predicates To LL(k): pred-LL(k
- PARR, QUONG
- 1994
(Show Context)
Citation Context ...tical machine-oriented languages. Many forms of non-greedy behavior are still available in PEGs when desired, however, through the use of predicates. The operators & and ! denote syntactic predicates =-=[20]-=-, which provide much of the practical expressive power of PEGs. The expressions`&e' attempts to match pattern e, then unconditionally backtracks to the starting point, preserving only the knowledge of... |

22 | Fast context-free grammar parsing requires fast Boolean matrix multiplication
- LEE
(Show Context)
Citation Context ...ented languages that are intended to be precise and unambiguous. Ambiguity in CFGs is difficult to avoid even when we want to, and it makes general CFG parsing an inherently super-linear-time problem =-=[14, 23]-=-. This paper develops an alternative, recognition-based formal foundation for language syntax, Parsing Expression Grammars or PEGs. PEGs are stylistically similar to CFGs with RE-like features added, ... |

21 | Modular Grammars for Programming Language Prototyping
- Adams
- 1991
(Show Context)
Citation Context ...haps in large measure because they were originally developed and presented as formal models for certain types of top-down parsers, rather than as a useful syntactic foundation in its own right. Adams =-=[1]-=- used TDPL in a modular language prototyping framework, however. In addition, many practical top-down parsing libraries and toolkits, including the popular ANTLR [21] and the PARSEC combinator library... |

20 | Lower bounds for matrix product
- Shpilka
(Show Context)
Citation Context ...ented languages that are intended to be precise and unambiguous. Ambiguity in CFGs is difficult to avoid even when we want to, and it makes general CFG parsing an inherently super-linear-time problem =-=[14, 23]-=-. This paper develops an alternative, recognition-based formal foundation for language syntax, Parsing Expression Grammars or PEGs. PEGs are stylistically similar to CFGs with RE-like features added, ... |

18 | Packrat parsing: a practical linear-time algorithm with backtracking
- Ford
- 2002
(Show Context)
Citation Context ...EG is a purely syntactic formalism, not by itself capable of expressing languages whose syntax depends on semantic predicates [20]. Although the Java language can be described as a single unified PEG =-=[7], C a-=-nd C++ parsers require an incrementally constructed symbol table to distinguish between ordinary identifiers and typedef-defined type identifiers. Haskell uses a special stage in the "syntactic p... |

17 |
Parsing algorithms with backtrack
- BIRMAN, ULLMAN
- 1973
(Show Context)
Citation Context ...A PEG may be viewed as a formal description of a top-down parser. Two closely related prior systems upon which this work is based were developed primarily for the purpose of studying top-down parsers =-=[4, 5]-=-. PEGs have far more syntactic expressiveness than the LL(k) language class typically associated with top-down parsers, however, and can express all deterministic LR(k) languages and many others, incl... |

17 |
Noncanonical SLR(1) grammars
- Tai
- 1979
(Show Context)
Citation Context ...ules, and supports unified language descriptions by combining lexical and context-free syntax definitions into a "two-level" formalism [10]. The nondeterministic linear-time NSLR(1) parsing =-=algorithm [26] is powerf-=-ul enough to generate "scannerless" parsers from unified syntax definitions without treating lexical analysis separately [22], but the algorithm severely restricts the form in which such CFG... |

16 |
J.D.: The Theory of Parsing, Translation and Compiling. Vol. I: Parsing
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...man [4, 5], in reference to an early syntax-directed compiler-compiler. These systems were later called TDPL ("TopDown Parsing Language") and GTDPL ("Generalized TDPL") respectivel=-=y by Aho and Ullman [3]-=-. By extension we prove that with minor caveats TS/TDPL and gTS/GTDPL are equivalent in recognition power, an unexpected result contrary to prior conjectures [5]. The rest of this paper is organized a... |

15 | The metafront system: Extensible parsing and transformation
- Brabrand, Schwartzbach, et al.
- 2003
(Show Context)
Citation Context ...ater incorporated into JavaCC under the name "syntactic lookahead" [16]. The metafront system includes a limited, fixed-lookahead form of syntactic predicates under the terms "attractor=-=s" and "traps" [6]. The nega-=-tive form of syntactic predicate (the "not-predicate") appears to be new, but its effect can be achieved in practical parsing systems such as ANTLR and JavaCC using semantic predicates [17].... |

14 | A family of syntax definition formalisms
- Visser
- 1995
(Show Context)
Citation Context ...s use CFGs extended with explicit disambiguation rules to express both lexical and hierarchical syntax, supporting unified syntax definitions more cleanly while giving up strictly linear-time parsing =-=[21, 29, 27]. These sy-=-stems graft recognition-based functionality onto generative CFGs, resulting in a "hybrid" generative/recognition-based syntactic model. PEGs provide similar features in a simpler syntactic f... |

10 |
Parsec, a fast combinator parser. http://www.cs.uu.nl/ ~ daan/parsec.html
- Leijen
- 2000
(Show Context)
Citation Context ... a modular language prototyping framework, however. In addition, many practical top-down parsing libraries and toolkits, including the popular ANTLR [21] and the PARSEC combinator library for Haskell =-=[15], provide -=-backtracking capabilities that conform to this model in practice, if perhaps unintentionally. These existing systems generally use "naive" backtracking methods that risk exponential runtime ... |

7 |
Derivational minimalism. Logical Aspects of Computational Linguistics
- Stabler
- 1997
(Show Context)
Citation Context ...17]. Many extensions and variations of context-free grammars have been developed, such as indexed grammars [2], W-grammars [28], affix grammars [13], tree-adjoining grammars [12], minimalist grammars =-=[24]-=-, and conjunctive grammars [18]. Most of these extensions are motivated by the requirements of expressing natural languages, and all are at least as difficult to parse as CFGs. Since machine-oriented ... |

6 |
The TMG Recognition Schema
- Birman
- 1970
(Show Context)
Citation Context ...ng [14] and matrix product [23] shows at least that general CFG parsing is inherently super-linear. 6 Related Work This work is inspired by and heavily based on Birman's TS/TDPL and gTS/GTDPL systems =-=[4, 5, 3]-=-. The # G relation and the basic properties in Sections 3.3 and 3.4 are direct adaptations of Birman's work. The major new features of the present work are the extension to support general parsing exp... |

3 |
C.: Parsing Techniquesâ€”A Practical Guide
- Grune, Jacobs
- 2008
(Show Context)
Citation Context ... parsing, of strings. Bridging the gap from generative definitions to practical recognizers is the purpose of our ever-expanding library of parsing algorithms with diverse capabilities and trade-offs =-=[9]-=-. Chomsky's generative system of grammars, from which the ubiquitous context-free grammars (CFGs) and regular expressions (REs) arise, was originally designed as a formal tool for modelling and analyz... |