## Are Very Large Context-Free Grammars Tractable?

### Cached

### Download Links

Citations: | 1 - 0 self |

### BibTeX

@MISC{Boullier_arevery,

author = {Pierre Boullier and Benoît Sagot},

title = {Are Very Large Context-Free Grammars Tractable?},

year = {}

}

### OpenURL

### Abstract

In this paper, we present a method which, in practice, allows to use parsers for languages defined by very large context-free grammars (over a million symbol occurrences). The idea is to split the parsing process in two passes. A first pass computes a sub-grammar which is a specialized part of the large grammar selected by the input text and various filtering strategies. The second pass is a traditional parser which works with the subgrammar and the input text. This approach is validated by practical experiments performed on a Earley-like parser running on a test set with two large context-free grammars. 1

### Citations

3836 |
Introduction to automata theory, languages, and computation
- Hopcroft, Motwani, et al.
- 2001
(Show Context)
Citation Context ...-grammar Algorithm An algorithm which takes as input any CFG G = (N,T,P,S) and generates as output a strongly equivalent reduced CFG G ′ and which runs in O(|G|) can be found in many text books (See (=-=Hopcroft and Ullman, 1979-=-) for example). So as to eliminate from all our intermediate subgrammars all useless productions, each filtering strategy will end by a call to such an algorithm named make-a-reduced-grammar. 97The m... |

652 | An efficient context-free parsing algorithm
- Earley
- 1970
(Show Context)
Citation Context ...mpute the FSA at parse time on a given source text. Based on experimental results, he shows that his method called dynamic set automaton (DSA) is tractable. He uses it to guide an Earley parser (See (=-=Earley, 1970-=-)) and shows improvements over the non guided version. The DSA method can directly be used as a filtering strategy since the states of the underlying FSA are in fact sets of items. For a CFG G = (N,T,... |

119 | The Structure of Shared Forests in Ambiguous Parsing
- Billot, Lang
- 1989
(Show Context)
Citation Context ...k[ik−1..ik] · · · Xp[ip−1..ip] with i0 = |w1| + 1, i1 = i0 + |x1|, . . . , ik = ik−1 + |xk| . . . and ip = i0 + |w2| is an element of P w G . 6 The popular notion of shared forests mainly comes from (=-=Billot and Lang, 1989-=-). configurations, noted ⊢ by (q,tx) ⊢ (q A A ′,x), iff (q,t,q ′) ∈ δ. If w ′ w ′′ ∈ T ∗ , we call derivation any sequence of the form (q′,w ′ w ′′) ⊢ · · · ⊢ (q A A ′′,w ′′). If w ∈ T ∗ , the initial... |

89 |
An Efficient Recognition and Syntax Algorithm for Context-Free Languages,” Scientific Report AFCRL-65-758, Air Force Cambridge Research Laboratory
- Kasami
- 1965
(Show Context)
Citation Context ...ily apply to our specific case. The experiment campaign as been conducted in using an Earley-like parser. 18 We have also successfuly tried the coupling of our filtering strategies with a CYK parser (=-=Kasami, 1967-=-; Younger, 1967) as post-processor. However the coupling with a GLR parser (See (Satta, 1992) for example) is perhaps more problematic since the time taken to build up the underlying nondeterministic ... |

77 | Tree insertion grammar: A cubic-time parsable formalism that lexicalizes context-free grammar without changing the trees produced. Technical report, Mitsubishi Electric Research Laboratories
- Schabes, Waters
- 1994
(Show Context)
Citation Context ...ning Grammars (TAG, see e.g., (Schabes et al., 1988)) used in NLP applications can (almost) be seen as lexicalized Tree Insertion Grammars (TIG), which can be converted into strongly equivalent CFGs (=-=Schabes and Waters, 1995-=-). Hence, the parsing techniques and tools described here can be applied to most TAGs used for NLP, with, in the worst case, a light over-generation which can be easily and efficiently eliminated in a... |

36 | An ef�cient implementation of the head-corner parser - Noord - 1997 |

23 | Generalized left-corner parsing
- Nederhof
- 1993
(Show Context)
Citation Context ...n order to formalize these notions we define several binary relations together with their (reflexive) transitive closure. Within a CFG G = (N,T,P,S), we first define left-corner noted �. Left-corner (=-=Nederhof, 1993-=-; Moore, 2000), hereafter LC, is a well-known relation since many parsing strategies are based upon it. We say that X is in the LC of A and we write A � X iff (A,X) ∈ {(B,Y ) | B → αY β ∈ P ∧ α ∗ ⇒ G ... |

19 | Improved left-corner chart parsing for large context-free grammars
- Moore
- 2000
(Show Context)
Citation Context ...lize these notions we define several binary relations together with their (reflexive) transitive closure. Within a CFG G = (N,T,P,S), we first define left-corner noted �. Left-corner (Nederhof, 1993; =-=Moore, 2000-=-), hereafter LC, is a well-known relation since many parsing strategies are based upon it. We say that X is in the LC of A and we write A � X iff (A,X) ∈ {(B,Y ) | B → αY β ∈ P ∧ α ∗ ⇒ G ε}. We can wr... |

19 |
Bidirectional context-free grammar parsing for natural language processing
- Satta, Stock
- 1994
(Show Context)
Citation Context ...parsing and in particular in CF parsing. Many parsers process their inputs from left to right but we can find in the literature other parsing strategies. In particular, in NLP, (van Noord, 1997) and (=-=Satta and Stock, 1994-=-) propose bidirectional algorithms. These parsers have the reputation to have a better efficiency than their left-to-right counterpart. This reputation is not only based upon experimental results (van... |

18 | Clément, Éric Villemonte de La Clergerie, and Pierre Boullier. 2006. The Lefff 2 syntactic lexicon for french: architecture, acquisition, use - Sagot, Lionel |

14 | From raw corpus to word lattices: robust pre-parsing precessing
- Sagot, Boullier
- 2005
(Show Context)
Citation Context ...luation campaign corpus. Raw sentences have been turned into DAGs of inflected forms known by both grammar/lexicon couples. 15 This step has been achieved by the presyntactic processing chain SXPipe (=-=Sagot and Boullier, 2005-=-). They are all recognized by both grammars. 16 The resulting DAGs have a median size of 28 and an average size of 31.7. Before entering into details, let us give here the first important result of th... |

3 |
Parsing techniques. In Survey of the state of the art in human language technology
- Joshi
- 1997
(Show Context)
Citation Context ...yond a value of say 100, while its average length is around 20-30 words. 2 In these conditions, the size of the grammar, despite its linear impact on the complexity, may be the prevailing factor: in (=-=Joshi, 1997-=-), the author remarks that “the real limiting factor in practice is the size of the grammar”. The idea developed in this paper is to split the parsing process in two passes. A first pass called filter... |

1 |
Review of ”generalized lr parsing” by masaru tomita. kluwer academic publishers
- Satta
- 1992
(Show Context)
Citation Context ...like parser. 18 We have also successfuly tried the coupling of our filtering strategies with a CYK parser (Kasami, 1967; Younger, 1967) as post-processor. However the coupling with a GLR parser (See (=-=Satta, 1992-=-) for example) is perhaps more problematic since the time taken to build up the underlying nondeterministic LR automaton from the sub-grammar can be prohibitive. Though no definitive answer can be mad... |