Towards efficient, typed LR parsers
 In ACM SIGPLAN Workshop on ML, Electronic Notes in Theoretical Computer Science
, 2005
Abstract The LR parser generators that are bundled with many functional programming language implementations produce code that is untyped, needlessly inefficient, or both. We show that, using generalized algebraic data types, it is possible to produce parsers that are welltyped (so they cannot unex ...
Abstract

Cited by 17 (8 self)
Abstract The LR parser generators that are bundled with many functional programming language implementations produce code that is untyped, needlessly inefficient, or both. We show that, using generalized algebraic data types, it is possible to produce parsers that are welltyped (so they cannot unexpectedly crash or fail) and nevertheless efficient. This is a pleasing result as well as an illustration of the new expressiveness offered by generalized algebraic data types.
Faster Generalized LR Parsing
 CC’99, volume 1575 of LNCS
, 1999
Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars efficiently. A GLR parser uses lineartime LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we ...
Abstract

Cited by 17 (2 self)
Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars efficiently. A GLR parser uses lineartime LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers by building larger pushdown automata, trading space for time. A variant of the GLR algorithm then incorporates our faster LR parsers. Our timings show that our new method for GLR parsing can parse highly ambiguous grammars significantly faster than a standard GLR parser.
DirectlyExecutable Earley Parsing
, 2001
Deterministic parsing techniques are typically used in favor of general parsing algorithms for efficiency reasons. However, general algorithms such as Earley's method are more powerful and also easier for developers to use, because no seemingly arbitrary restrictions are placed on the grammar. ...
Abstract

Cited by 12 (2 self)
Deterministic parsing techniques are typically used in favor of general parsing algorithms for efficiency reasons. However, general algorithms such as Earley's method are more powerful and also easier for developers to use, because no seemingly arbitrary restrictions are placed on the grammar. We describe how to narrow the performance gap between general and deterministic parsers, constructing a directly executable Earley parser that can reach speeds comparable to deterministic methods even on grammars for commonlyused programming languages.
Faster scannerless GLR parsing
 In Proceedings of the 18th International Conference on Compiler Construction (CC
, 2009
Abstract. Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokeniza ...
Abstract

Cited by 5 (0 self)
Abstract. Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of contextfree grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more nondeterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the RightNulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33 % faster than SGLR, but is 95 % faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%. 1
The Art of
 Computer Programming
, 1973
To hardcode and algorithm means to build into it the data that it requires. In this paper, we present various experiments in hardcoding the transition table of a finite state machine directly into stringrecognizing code. Experiments are carried out in two phases. The first phase is limited to the a ...
Abstract

Cited by 2 (1 self)
To hardcode and algorithm means to build into it the data that it requires. In this paper, we present various experiments in hardcoding the transition table of a finite state machine directly into stringrecognizing code. Experiments are carried out in two phases. The first phase is limited to the analysis of the hardcoded behavior in relation to acceptance or rejection of a single symbol in some arbitrary state of some finite automaton. Then follows a simulation of the analysis of some hardcoded solution for recognizing an entire string. Measurements are provided to show the time efficiency gains by various hardcoded versions over the traditional tabledriven approach.
Generalised LR parsing algorithms
, 2006
This thesis concerns the parsing of contextfree grammars. A parser is a tool, defined for a specific grammar, that constructs a syntactic representation of an input string and determines if the string is grammatically correct or not. An algorithm that is capable of parsing any contextfree grammar ...
Abstract

Cited by 1 (0 self)
This thesis concerns the parsing of contextfree grammars. A parser is a tool, defined for a specific grammar, that constructs a syntactic representation of an input string and determines if the string is grammatically correct or not. An algorithm that is capable of parsing any contextfree grammar is called a generalised (contextfree) parser. This thesis is devoted to the theoretical analysis of generalised parsing algorithms. We describe, analyse and compare several algorithms that are based on Knuth’s LR parser. This work underpins the design and implementation of the Parser Animation Tool (PAT). We use PAT to evaluate the asymptotic complexity of generalised parsing algorithms and to develop the Binary Right Nulled Generalised LR algorithm – a new cubic worst case parser. We also compare the Right Nullable Generalised LR, Reduction Incorporated Generalised LR, Farshi, Tomita and Earley algorithms using the statistical data collected by PAT. Our study indicates that the overheads associated with some of the parsing algorithms may have significant consequences on their behaviour.
jacc: just another compiler compiler for Java A Reference Manual and User Guide
, 2004
jacc is a parser generator for Java [3] that is closely modeled on Johnson's classic yacc parser generator for C [7]. It is easy to find other parser generators ...
Abstract

Cited by 1 (0 self)
jacc is a parser generator for Java [3] that is closely modeled on Johnson’s classic yacc parser generator for C [7]. It is easy to find other parser generators
Locating Free Positions in LR(k) Grammars*
LR(k) is the most general category of lineartime parsing. Before a symbol is recognized in LR parsing, it is difficult to invoke the semantic action associated with the symbol. Adding semantic actions to an LR(k) grammar may result in a nonLR(k) grammar. There are two straightforward approaches ad ...
Abstract
LR(k) is the most general category of lineartime parsing. Before a symbol is recognized in LR parsing, it is difficult to invoke the semantic action associated with the symbol. Adding semantic actions to an LR(k) grammar may result in a nonLR(k) grammar. There are two straightforward approaches adopted by practitioners of parser generators. The first approach is to delay all semantic actions until the whole parse tree is constructed. The second is to add semantic actions to the grammar by chance. This paper presents an efficient algorithm for finding positions (called free positions) that can freely put semantic actions into an LR(k) grammar. The speedups of our method range from 2.23 to 15.50 times for the eight tested grammars. Keywords: parser generators, LR(k) grammars, semantic actions, parse tree, free positions 1.