Tree-bank Grammars (1996) [171 citations — 2 self]
Abstract:
By a "tree-bank grammar" we mean a context-free grammar created by reading the production rules directly from hand-parsed sentences in a tree bank. Common wisdom has it that such grammars do not perform well, though we know of no published data on the issue. The primary purpose of this paper is to show that the common wisdom is wrong. In particular we present results on a tree-bank grammar based on the Penn Wall Street Journal tree bank. To the best of our knowledge, this grammar out-performs all other non-word-based statistical parsers/grammars on this corpus. That is, it out-performs parsers that consider the input as a string of tags and ignore the actual words of the corpus. 1 Introduction The simplest way to "learn" a context-free grammar from a parsed corpus (a "tree bank"), is to read the grammar off the parsed sentences. That is, if we have the sentence diagrammed in Figure 1 we can read the following rules off this diagram: S ! NP VP NP ! pron VP ! vb NP NP ! dt nn This r...
Citations
| 1196 | Building a large annotated corpus of English: the penn treebank – Marcus, Marcinkiewicz, et al. - 1993 |
| 449 | Statistical Language Learning – Charniak - 1997 |
| 260 | Statistical Parsing with a ContextFree Grammar and Word Statistics – Charniak - 1997 |
| 239 | Statistical decision-tree models for parsing – Magerman - 1995 |
| 200 | Inside-outside reestimation from partially bracketed corpora – Pereira, Schabes - 1992 |
| 113 | Automatic grammar induction and parsing free text: A transformationbased approach – Brill - 1993 |
| 53 | New figures of merit for best-first probabilistic chart parsing – Caraballo, Charniak - 1998 |

