Results 1 - 10
of
16
Discriminative Reranking for Natural Language Parsing
, 2005
"... This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this i ..."
Abstract
-
Cited by 220 (8 self)
- Add to MetaCart
This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75 % F-measure, a 13 % relative decrease in F-measure error over the baseline model’s score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative—in terms of both simplicity and efficiency—to work on feature selection methods within log-linear (maximum-entropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
A Comparison of Algorithms for Maximum Entropy Parameter Estimation
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract
-
Cited by 171 (1 self)
- Add to MetaCart
A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limited-memory variable metric algorithm outperformed the other choices.
Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques
- IN PROCEEDINGS OF THE 40TH MEETING OF THE ACL
, 2002
"... We present a stochastic parsing system consisting of a Lexical-Functional Grammar (LFG), a constraint-based parser and a stochastic disambiguation model. We report on the results of applying this system to parsing the UPenn Wall Street Journal (WSJ) treebank. The model combines full and parti ..."
Abstract
-
Cited by 95 (8 self)
- Add to MetaCart
We present a stochastic parsing system consisting of a Lexical-Functional Grammar (LFG), a constraint-based parser and a stochastic disambiguation model. We report on the results of applying this system to parsing the UPenn Wall Street Journal (WSJ) treebank. The model combines full and partial parsing techniques to reach full grammar coverage on unseen data. The treebank annotations are used to provide partially labeled data for discriminative statistical estimation using exponential models. Disambiguation performance is evaluated by measuring matches of predicate-argument relations on two distinct test sets. On a gold standard of manually annotated f-structures for a subset of the WSJ treebank, this evaluation reaches 79% F-score. An evaluation on a gold standard of dependency relations for Brown corpus data achieves 76% F-score.
Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron
, 2002
"... We describe algorithms that rerank the top N hypotheses from a maximum-entropy tagger, the application being named-entity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms g ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We describe algorithms that rerank the top N hypotheses from a maximum-entropy tagger, the application being named-entity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.
Alpino: Wide-coverage Computational Analysis of Dutch
- In
, 2000
"... Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing ..."
Abstract
-
Cited by 55 (10 self)
- Add to MetaCart
Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect of wide-coverage parsing is robustness and disambiguation. The dependency relations encoded in the dependency structures have been used to develop and evaluate both hand-coded and statistical disambiguation methods.
The Alpino Dependency Treebank
- In Computational Linguistics in the Netherlands (CLIN
, 2002
"... In this paper we present the Alpino Dependency Treebank and the tools that we have developed to facilitate the annotation process. Annotation typically starts with parsing a sentence with the Alpino parser, a wide coverage parser of Dutch text. The number of parses that is generated is reduced throu ..."
Abstract
-
Cited by 50 (5 self)
- Add to MetaCart
In this paper we present the Alpino Dependency Treebank and the tools that we have developed to facilitate the annotation process. Annotation typically starts with parsing a sentence with the Alpino parser, a wide coverage parser of Dutch text. The number of parses that is generated is reduced through interactive lexical analysis and constituent marking. A tool for on line addition of lexical information facilitates the parsing of sentences with unknown words. The selection of the best parse is done efficiently with the parse selection tool. At this moment, the Alpino Dependency Treebank consists of about 6,000 sentences of newspaper text that are annotated with dependency trees. The corpus can be used for linguistic exploration as well as for training and evaluation purposes.
Parameter Estimation for Statistical Parsing Models: Theory and Practice of Distribution-Free Methods
, 2001
"... A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximum-likelihood estimation. The assumptions under which m ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximum-likelihood estimation. The assumptions under which maximum-likelihood estimation is justified are arguably quite strong. This paper discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to (smoothed) maximumlikelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature -- specifically, generalization results based on margins on training data -- can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.
Probabilistic Syntax
, 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered -- and hence must be estimated to have probability zero, Chomsky wrongly assumes -- but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mind-internal I-language as opposed to the observed-in-the-world E-language. This chapter is not meant to be a detailed critique of Chomsky's arguments -- Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion -- but some of these concerns are still importa
Automatic F-Structure Annotation Of Treebank Trees
- THE FIFTH INTERNATIONAL CONFERENCE ON LEXICAL-FUNCTIONAL GRAMMAR, THE UNIVERSITY OF CALIFORNIA AT BERKELEY, 19 JULY - 20 JULY 2000, CSLI
, 2000
"... We describe a method that automatically induces LFG f-structures from treebank tree representations, given a set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a linguistically informed, principle-based way. ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
We describe a method that automatically induces LFG f-structures from treebank tree representations, given a set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a linguistically informed, principle-based way.
Optimality-theoretic Lexical Functional Grammar
- Rutgers University
"... This paper begins with a review of Bresnan's proposals concerning inflectional systems, and evaluates the extent to which they make empiricallytestable predictions. While Bresnan's current proposal is not yet precise enough to enable one to determine just what the range of possible agreement systems ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper begins with a review of Bresnan's proposals concerning inflectional systems, and evaluates the extent to which they make empiricallytestable predictions. While Bresnan's current proposal is not yet precise enough to enable one to determine just what the range of possible agreement systems is predicted to be, it does seem that several putative linguistic universals concerning agreement can be stated in her framework.

