Abstract:
For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machineoriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power.
Citations
|
1295
|
The C++ Programming Language
– Stroustrup
- 1991
|
|
447
|
Tree-Adjoining Grammars
– Joshi, Schabes
- 1997
|
|
119
|
The syntax definition formalism sdf - reference manual
– Heering, Hendriks, et al.
- 1989
|
|
59
|
Disambiguation Filters for Scannerless Generalized LR Parsers
– Brand, Scheerder, et al.
- 2002
|
|
53
|
What can we do about the Unnecessary Diversity of Notation for Syntactic Definitions
– Wirth
- 1977
|
|
40
|
Revised Report on the Algorithmic Language ALGOL 68
– Wijngaarden
- 1974
|
|
34
|
Conjunctive grammars
– Okhotin
- 2001
|
|
31
|
Packrat parsing:: simple, powerful, lazy, linear time, functional pearl
– Ford
|
|
25
|
Indexed grammars—An extension of context-free grammars
– Aho
- 1968
|
|
25
|
Scannerless NSLR(1) Parsing of Programming Languages
– Salomon, Cormack
- 1989
|
|
23
|
Affix Grammars
– Koster
- 1968
|
|
22
|
Adding Semantic and Syntactic Predicates To LL(k): pred-LL(k
– Parr, Quong
- 1994
|
|
19
|
Modular Grammars for Programming Language Prototyping
– Adams
- 1991
|
|
16
|
Parsing algorithms with backtrack
– Birman, Ullman
|
|
16
|
Lower bounds for matrix product
– Shpilka
|
|
15
|
Fast context-free grammar parsing requires fast Boolean matrix multiplication
– Lee
- 2002
|
|
15
|
Noncanonical SLR(1) grammars
– Tai
- 1979
|
|
13
|
The theory of parsing, translation and compiling Vol. I: Parsing
– Aho, Ullman
- 1972
|
|
13
|
Packrat Parsing: a practical linear-time algorithm with backtracking
– Ford
- 2002
|
|
13
|
A family of syntax definition formalisms
– Visser
- 1997
|
|
11
|
The metafront system: Extensible parsing and transformation
– Brabrand, Schwartzbach, et al.
- 2003
|
|
9
|
Parsec, a fast combinator parser. http://www.cs.uu.nl/˜daan
– Leijen
- 2000
|
|
6
|
The TMG Recognition Schema
– Birman
- 1970
|
|
6
|
Derivational minimalism. Logical Aspects of Computational Linguistics
– Stabler
- 1997
|
|
2
|
Parsing Techniques---A Practical Guide
– Grune, Jacobs
- 1990
|