Results 1 - 10
of
39
A Maximum-Entropy-Inspired Parser
, 1999
"... We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of ..."
Abstract
-
Cited by 671 (16 self)
- Add to MetaCart
We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of the Wall Street Journal tree- bank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innova- tion is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.
Intricacies of Collins' Parsing Model
- COMPUTATIONAL LINGUISTICS
"... This paper documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins' thesis (Collins, 1999), this paper contains all information necessary to duplicate Collins' benchmark results. Indeed, these as-yet-unpublished details account for an 11% rel ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
This paper documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins' thesis (Collins, 1999), this paper contains all information necessary to duplicate Collins' benchmark results. Indeed, these as-yet-unpublished details account for an 11% relative reduction in error between a clean-room implementation of Collins' model and an implementation including all details. We also show a cleaner and equally--well-performing method for the handling of punctuation and conjunction, and reveal certain other probabilistic oddities about Collins' parser. We analyze not only the effect of the unpublished details, but also re-analyze the effect of certain well-known details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought. Finally, we perform experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the head word and its part of speech
Scaling to Very Very Large Corpora for Natural Language Disambiguation
, 2001
"... The amount of readily available online text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this pape ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
The amount of readily available online text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost.
Probabilistic Top-Down Parsing and Language Modeling
- Computational Linguistics
, 2004
"... This paper describes the functioning of a broad-coverage probabilistic top-down parser, and its application to the problem of language modeling for speech recognition. The paper first introduces key notions in language modeling and probabilistic parsing, and briefly reviews some previous approaches ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This paper describes the functioning of a broad-coverage probabilistic top-down parser, and its application to the problem of language modeling for speech recognition. The paper first introduces key notions in language modeling and probabilistic parsing, and briefly reviews some previous approaches to using syntactic structure for language modeling. A lexicalized probabilistic topdown parser is then presented, which performs very well, in terms of both the accuracy of returned parses and the efficiency with which they are found, relative to the best broad-coverage statistical parsers. A new language model that utilizes probabilistic top-down parsing is then outlined, and empirical results show that it improves upon previous work in test corpus perplexity. Interpolation with a trigram model yields an exceptional improvement relative to the improvement observed by other models, demonstrating the degree to which the information captured by our parsing model is orthogonal to that captured by a trigram model. A small recognition experiment also demonstrates the utility of the model
Is it harder to parse Chinese, or the Chinese Treebank?
- IN PROCEEDINGS OF THE 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2003
"... We present a detailed investigation of the challenges posed when applying parsing models developed against English corpora to Chinese. We develop a ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
We present a detailed investigation of the challenges posed when applying parsing models developed against English corpora to Chinese. We develop a
Improving Accuracy in Wordclass Tagging through Combination of Machine Learning Systems
- Computational Linguistics
, 2000
"... this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We hope to make use of this fact and reduce the number of errors with very little additional effort by exploiting the disagreement between different language models. Al- though the approach is applicable to any type of language model, we focus on the case of statistical disambiguators that are trained on annotated corpora. The examples of the task that are present in the corpus and its annotation are fed into a learning algorithm, which induces a model of the desired input-output mapping in the form of a classifier. * EO. Box 9103, 6500 HD Nijmegen, The Netherlands, hvh@let.ktm.nl t Universiteitsplein 1, 2610 Wilrijk, Belgium, {zavrel, daelem}@uia.ua.ac.be () 2000 Association for Computational Linguistics We use a number of different learning algorithms simultaneously on the same training corpus. Each type of learning method brings its own 'inductive bias' to the task and will produce a classifier with slightly different characteristics, so that different methods will tend to produce different errors
Parser combination by reparsing
- In Proc. HLT/NAACL
, 2006
"... We present a novel parser combination scheme that works by reparsing input sentences once they have already been parsed by several different parsers. We apply this idea to dependency and constituent parsing, generating results that surpass state-of-theart ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
We present a novel parser combination scheme that works by reparsing input sentences once they have already been parsed by several different parsers. We apply this idea to dependency and constituent parsing, generating results that surpass state-of-theart
A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation
, 2000
"... This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co-occurring words in varying sized windows of context. Despite the simplicity of this approach, empirical results ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co-occurring words in varying sized windows of context. Despite the simplicity of this approach, empirical results disambiguating the widely studied nouns line and interest show that such an ensemble achieves accuracy rivaling the best previously published results.
Modeling Consensus: Classifier Combination for Word Sense Disambiguation
, 2002
"... This paper demonstrates the substantial empirical success of classifier combination for the word sense disambiguation task. It investigates more than 10 classifier combination methods, including second order classifier stacking, over 6 major structurally different base classifiers (enhanced Nave Bay ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
This paper demonstrates the substantial empirical success of classifier combination for the word sense disambiguation task. It investigates more than 10 classifier combination methods, including second order classifier stacking, over 6 major structurally different base classifiers (enhanced Nave Bayes, cosine, Bayes Ratio, decision lists, transformationbased learning and maximum variance boosted mixture models). The paper also includes in-depth performance analysis sensitive to properties of the feature space and component classifiers. When evaluated on the standard SENSEVAL1 and 2 data sets on 4 languages (English, Spanish, Basque, and Swedish), classifier combination performance exceeds the best published results on these data sets. 1
Bagging and Boosting a Treebank Parser
, 2000
"... Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Erro ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Error analysis of the result of the boosting technique reveals some inconsistent annotations in the Penn Treebank, suggesting a semi-automatic method for finding inconsistent treebank annotations.

