## TRX: A formally verified parser interpreter (2011)

Venue: | LOGICAL METHODS IN COMPUTER SCIENCE 7(2 |

Citations: | 5 - 0 self |

### BibTeX

@INPROCEEDINGS{Koprowski11trx:a,

author = {Adam Koprowski and Henri Binsztok},

title = { TRX: A formally verified parser interpreter},

booktitle = {LOGICAL METHODS IN COMPUTER SCIENCE 7(2},

year = {2011},

publisher = {}

}

### OpenURL

### Abstract

Parsing is an important problem in computer science and yet surprisingly little attention has been devoted to its formal verification. In this paper, we present TRX: a parser interpreter formally developed in the proof assistant Coq, capable of producing formally correct parsers. We are using parsing expression grammars (PEGs), a formalism essentially representing recursive descent parsing, which we consider an attractive alternative to context-free grammars (CFGs). From this formalization we can extract a parser for an arbitrary PEG grammar with the warranty of total correctness, i.e., the resulting parser is terminating and correct with respect to its grammar and the semantics of PEGs; both properties formally proven in Coq.

### Citations

106 |
Recursive Programming Techniques
- Burge
- 1975
(Show Context)
Citation Context ... are unambiguous and allow easy integration of lexical analysis into the parsing phase. Their implementation is easy, as PEGs are essentially a declarative way of specifying recursive descent parsers =-=[5]-=-. With their backtracking and unlimited look-ahead capabilities they are expressive enough to cover all LL(k) and LR(k) languages as well as some non-context-free ones. However, recursive descent pars... |

98 | Higher-order functions for parsing
- Hutton
- 1992
(Show Context)
Citation Context ...s or libraries of parser combinators, is abundant. And yet there does seem to be hardly any work on formally verified parsing. In Danielsson and Norell [9] a library of parser combinators (see Hutton =-=[16]-=-) with termination guarantees has been developed in the dependently typed functional programming language Agda [27]. The main difference in comparison with our work is that they provide a library of c... |

76 | Parsing expression grammars: a recognition-based syntactic foundation - Ford - 2004 |

71 | Formal verification of a realistic compiler
- Leroy
- 2009
(Show Context)
Citation Context ... bugs in its distributions. Furthermore, the code generated by such tools often contains huge parsing tables making it near impossible for manual inspection and/or verification. In the recent article =-=[17]-=- about CompCert, an impressive project formally verifying a compiler for a large subset of C, the introduction starts with a question “Can you trust your compiler?”. Nevertheless, the formal verificat... |

60 | Packrat parsing: Simple, powerful, lazy, linear time
- Ford
- 2002
(Show Context)
Citation Context ...e not LL(k) may require exponential time. A solution to that problem is to use memoization giving rise to packrat parsing and ensuring linear time complexity at the price of higher memory consumption =-=[2, 13, 12]-=-. It is not easy to support (indirect) left-recursive rules in PEGs, as they lead to non-terminating parsers [29]. In this paper we present TRX: a PEG-based parser interpreter formally developed in th... |

44 |
The Java Language Specification (3rd Edition
- Gosling, Joy, et al.
- 2005
(Show Context)
Citation Context ... its simple format, for which the expressive power offered by PEGs is an overkill. Parsing Java seems to be an established benchmark for PEGs [24, 13, 12, 29]. One difficulty with the grammar of Java =-=[15]-=- is that it naturally contains left-recursive rules, most of which can be easily replaced with iteration, with the exception of a single definition [24], and for the moment TRX lacks the ability to ha... |

19 | Packrat Parsing: A Practical Linear-Time Algorithm with Backtracking
- Ford
- 2002
(Show Context)
Citation Context ...e not LL(k) may require exponential time. A solution to that problem is to use memoization giving rise to packrat parsing and ensuring linear time complexity at the price of higher memory consumption =-=[2, 13, 12]-=-. It is not easy to support (indirect) left-recursive rules in PEGs, as they lead to non-terminating parsers [29]. In this paper we present TRX: a PEG-based parser interpreter formally developed in th... |

17 | Functors for Proofs and Programs
- Filliâtre, Letouzey
- 2004
(Show Context)
Citation Context ...extraction from Coq, to ease practical use of TRX and to improve its performance. At the moment target languages for extraction from Coq are OCaml [18], Haskell [23] and Scheme [26]. We use the FSets =-=[11]-=- library, developed using Coq’s modules and functors [7], which are not yet supported by extraction to Haskell or Scheme. However, there is an ongoing work on porting FSets to type classes [25], which... |

16 |
J.D.: The Theory of Parsing, Translation and Compiling. Vol. I: Parsing
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...e not LL(k) may require exponential time. A solution to that problem is to use memoization giving rise to packrat parsing and ensuring linear time complexity at the price of higher memory consumption =-=[2, 13, 12]-=-. It is not easy to support (indirect) left-recursive rules in PEGs, as they lead to non-terminating parsers [29]. In this paper we present TRX: a PEG-based parser interpreter formally developed in th... |

11 |
Implementing Modules in the Coq System
- Chrzaszcz
- 2003
(Show Context)
Citation Context ...improve its performance. At the moment target languages for extraction from Coq are OCaml [18], Haskell [23] and Scheme [26]. We use the FSets [11] library, developed using Coq’s modules and functors =-=[7]-=-, which are not yet supported by extraction to Haskell or Scheme. However, there is an ongoing work on porting FSets to type classes [25], which are supported by extraction. In this section we will de... |

7 | A Large-Scale Experiment in Executing Extracted Programs
- Cruz-Filipe, Letouzey
- 2005
(Show Context)
Citation Context ...ported by extraction. In this section we will describe our experience with OCaml extraction on the example of an XML parser. A well-known issue with extraction is the performance of obtained programs =-=[8, 19]-=-. Often the root of this problem is the fact that many formalizations are not developed with extraction in mind and trying to extract a computational part of the proof can easily lead to disastrous pe... |

7 | Extraction in Coq: An overview
- Letouzey
- 2008
(Show Context)
Citation Context ...n this paper we present TRX: a PEG-based parser interpreter formally developed in the proof assistant Coq [28, 4]. As a result, expressing a grammar in Coq allows one, via its extraction capabilities =-=[19]-=-, to obtain a parser for this grammar with total correctness guarantees. That means that the resulting parser is terminating and correct with respect to its grammar and the semantics of PEGs; both of ... |

3 |
et al. Objective caml. http://caml.inria.fr
- Leroy
- 1996
(Show Context)
Citation Context ... a certified parser for G. We are interested in code extraction from Coq, to ease practical use of TRX and to improve its performance. At the moment target languages for extraction from Coq are OCaml =-=[18]-=-, Haskell [23] and Scheme [26]. We use the FSets [11] library, developed using Coq’s modules and functors [7], which are not yet supported by extraction to Haskell or Scheme. However, there is an ongo... |

2 | Structurally recursive descent parsing
- Danielsson, Norell
- 2008
(Show Context)
Citation Context ...c and the software for parsing, parser generators or libraries of parser combinators, is abundant. And yet there does seem to be hardly any work on formally verified parsing. In Danielsson and Norell =-=[9]-=- a library of parser combinators (see Hutton [16]) with termination guarantees has been developed in the dependently typed functional programming language Agda [27]. The main difference in comparison ... |

1 |
executable parsing
- Verified
- 2004
(Show Context)
Citation Context ...eas similar to Danielsson and Norell [9] were previously put forward, though just as a proof of concept, by McBride and McKinna [21]. Probably the closest work to ours is that of Barthwal and Norrish =-=[3]-=-, where the authors developed an SLR parser in HOL. The main differences with our work are: – PEGs are more expressive that SLR grammars, which are usually not adequate for real-world computer languag... |