## Head-Driven Statistical Models for Natural Language Parsing (1999)

A maximum-entropy-inspired parser.
- Charniak
- 2000
(Show Context)
Citation Context ...on marks) and have the same label 15 as a constituent in the treebank parse. Table 2 shows the results for models 1, 2 and 3 and a variety of other models in the literature. Two models (Collins 2000; =-=Charniak 2000-=-) outperform models 2 and 3 on section 23 of the treebank. Collins (2000) uses a technique based on boosting algorithms for machine learning that reranks n-best output from model 2 in this article. Ch... |

956 |
The language instinct.
- Pinker
- 1994
(Show Context)
Citation Context ...ysis is semantically quite plausible, consider Bill believed John to have been shot.) As evidence that structural preferences can even override semantic plausibility, take the following example (from =-=Pinker 1994-=-): (5) Flip said that Squeaky will do the work yesterday. This sentence is a garden path: The structural preference for yesterday to modify the most recent verb is so strong that it is easy to miss th... |

An efficient boosting algorithm for combining preferences
- Freund, Iyer, et al.
(Show Context)
Citation Context ...red model. The use of additional features gives clear improvements in performance. Collins (2000) shows similar improvements through a quite different model based on boosting approaches to reranking (=-=Freund et al. 1998). A-=-n initial model—in fact Model 2 described in the current article—is used to generate N-best output. The reranking approach attempts to rerank the N-best lists using additional features that are no... |

A maximum entropy model for part-of-speech tagging,"
- Ratnaparkhi
- 1996
(Show Context)
Citation Context ... be incorrect. With normalization, only the verb-object relation is incorrect. 17 The justification for this is that there is an estimated 3% error rate in the hand-assigned POS tags in the treebank (=-=Ratnaparkhi 1996),-=- and we didn’t want this noise to contribute to dependency errors.sCollins Head-Driven Statistical Models for NL Parsing Table 4 Dependency accuracy on section 0 of the treebank with Model 2. No lab... |

A New Statistical Parser Based on Bigram Lexical Dependencies.
- Collins
- 1996
(Show Context)
Citation Context ...atment of punctuation (section 4.3) together with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81=-=.4% Collins 1996-=- 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.... |

Statistical parsing with a context-free grammar and word statistics.
- Charniak
- 1997
(Show Context)
Citation Context ...ameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81=-=.4% Charniak 1997-=- 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.6% 88.7% 0.90 67.1% 87.4% Charniak 2000 90.1% 90.1% 0.74 70.1% 89.6% Collins 2000 90.... |

392 |
Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks. In
- Pereira, Shieber, et al.
- 1986
(Show Context)
Citation Context ...ve several equally semantically plausible analyses, but that structural preferences 619sComputational Linguistics Volume 29, Number 4 distinguish strongly among them. Take the following example (from =-=Pereira and Warren 1980-=-): (4) John was believed to have been shot by Bill. Surprisingly, this sentence has two analyses: Bill can be the deep subject of either believed or shot. Yet people have a very strong preference for ... |

Statistical decision-tree models for parsing.
- Magerman
- 1995
(Show Context)
Citation Context ...he main model changes were the improved treatment of punctuation (section 4.3) together with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 =-=CBs Magerman 1995-=- 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model ... |

Discriminative reranking for natural language parsing
- Collins, Koo
- 2004
(Show Context)
Citation Context ...ns, or quotation marks) and have the same label 15 as a constituent in the treebank parse. Table 2 shows the results for models 1, 2 and 3 and a variety of other models in the literature. Two models (=-=Collins 2000-=-; Charniak 2000) outperform models 2 and 3 on section 23 of the treebank. Collins (2000) uses a technique based on boosting algorithms for machine learning that reranks n-best output from model 2 in t... |

272 |
T.: A procedure for quantitatively comparing the syntactic coverage of English grammars. In:
- Black, Abney, et al.
(Show Context)
Citation Context ...Wall Street Journal portion of the Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993) (approximately 40,000 sentences) and tested on section 23 (2,416 sentences). We use the PARSEVAL measures (=-=Black et al. 1991-=-) to compare performance: Labeled precision = number of correct constituents in proposed parse number of constituents in proposed parse number of correct constituents in proposed parse Labeled recall ... |

A linear observed time statistical parser based on maximum entropy models
- Ratnaparkhi
- 1997
(Show Context)
Citation Context ...70.7% 89.6% Model ≤ 100 Words (2,416 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.0% 84.3% 1.46 54.0% 78.8% Collins 1996 85.3% 85.7% 1.32 57.2% 80.8% Charniak 1997 86.7% 86.6% 1.20 59.5% 83=-=.2% Ratnaparkhi 1997 8-=-6.3% 87.5% 1.21 60.2% — Model 1 87.5% 87.7% 1.09 63.4% 84.1% Model 2 88.1% 88.3% 1.06 64.0% 85.1% Model 3 88.0% 88.3% 1.05 64.3% 85.4% Charniak 2000 89.6% 89.5% 0.88 67.6% 87.7% Collins 2000 89.6% 8... |

Prepositional phrase attachment through a backed-off model,"
- Collins, Brooks
- 1995
(Show Context)
Citation Context ... some structural preference is not ideal, but is at least better than chance. This hypothesis is suggested by previous work on specific cases of attachment ambiguity such as PP attachment (see, e.g., =-=Collins and Brooks 1995-=-), which has showed that models will perform better given lexical statistics, and that a straight structural preference is merely a fallback. But some examples suggest this is not the case: that, in f... |

Exploiting syntactic structure for language modeling.
- CHELBA, JELINEK
- 1998
(Show Context)
Citation Context ...und 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (Jelinek et al. 1994; Magerman 1995; Eisner 1996a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; =-=Chelba and Jelinek 1998-=-; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models that are also closely related to the models in this article. Collins (1996) also describes a dependency-based model appli... |

67 |
Building A Large Annotated
- Marcus, Santorini, et al.
- 1993
(Show Context)
Citation Context ...babilities conditioned on lexical heads. For this reason we refer to the models as head-driven statistical models. We describe evaluation of the three models on the Penn Wall Street Journal treebank (=-=Marcus et al. 1993-=-). Model 1 achieves 87.7/87.5% constituent precision and recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.3/88.0% cons... |

63 |
Building a large annotated corpus of English: The Penn treebank. Computational Linguistics,
- Marcinkiewicz
- 1993
(Show Context)
Citation Context ...on lexical heads. For this reason we refer to the models as head-driven statistical models. We describe evaluation of the three models on the Penn Wall Street Journal Treebank (Marcus, Santorini, and =-=Marcinkiewicz 1993-=-). Model 1 achieves 87.7% constituent precision and 87.5% consituent recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.... |

Decision tree parsing using a hidden derivation model
- Jelinek, Lafferty, et al.
- 1994
(Show Context)
Citation Context ...ted work, chapter 4 of Collins (1999) attempts to give a comprehensive review of work on statistical parsing up to around 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (=-=Jelinek et al. 1994-=-; Magerman 1995; Eisner 1996a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; Chelba and Jelinek 1998; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models... |

An Empirical Comparison of Probability Models for Dependency Grammar.
- Eisner
- 1996
(Show Context)
Citation Context ...) attempts to give a comprehensive review of work on statistical parsing up to around 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (Jelinek et al. 1994; Magerman 1995; =-=Eisner 1996-=-a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; Chelba and Jelinek 1998; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models that are also closely relat... |

Probabilistic feature grammars.
- Goodman
- 1997
(Show Context)
Citation Context ...er with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83=-=.6% Goodman 1997-=- 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.6% 88.7% 0.90 67.1% 87.4% Charniak 2000 90... |

What is the Minimal Set of Fragments that Achieves Maximum Parse Accuracy?
- Bod
- 2001
(Show Context)
Citation Context ...., 1998). This approach intends to allow greatsexibility in the features which can be incorporated in a model, and additional features are shown to give improvements in parsing performance. Finally, (=-=Bod 200-=-1) describes a very dierent approach { a DOP approach to parsing { which gives excellent results on treebank parsing, comparable to the results of (Charniak 2000; Collins 2000). 8.1 A Comparison to th... |

15 |
nymble: a high-performance learning name
- Bikel, Miller, et al.
- 1997
(Show Context)
Citation Context ...context at levels 1, 2 and 3 in the table, and 1 , 2 and 3 are smoothing parameters where 0 i 1. We use 17 Computational Linguistics Volume ??, Number ? the smoothing method described in (Bikel =-=et al. 199-=-7), which is derived from a method described in (Witten and Bell 1991). First, say the most specic estimate e 1 = n1 f1 { that is, f 1 is the value of the denominator count in the relative frequency e... |

1 |
Head-Driven Statistical Models for NL Parsing
- Collins
- 1996
(Show Context)
Citation Context ...erivation order is depth-rst | that is, each modier recursively generates the sub-tree below it before the next modier is generated. Figure 3 gives an example that illustrates this. The models in (Col=-=lins 199-=-6) showed that the distance between words standing in head-modier relationships was important, in particular that it is important to capture a preference for right-branching structures (which almost t... |

1 |
Computational Linguistics Volume ??, Number ? de
- Marcken
- 1995
(Show Context)
Citation Context ...cribe an alternative, \supertagging" model for tree adjoining grammars. See (Alshawi 1996) for work on stochastic headautomata, and (Laerty et al. 1992) for a stochastic version of link grammar. =-=(de Marcken 19-=-95) considers stochastic lexicalized PCFGs, with specic reference to EM methods for unsupervised training. (Sene 1992) describes the use of markov models for rule generation, which is closely related ... |

