Results 11  20
of
111
Learning to Parse Natural Language with Maximum Entropy Models
, 1999
"... This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at stateoftheart accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the pa ..."
Abstract

Cited by 165 (0 self)
 Add to MetaCart
This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at stateoftheart accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the stateoftheart.
Supertagging: An Approach to Almost Parsing
 Computational Linguistics
, 1999
"... this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated wit ..."
Abstract

Cited by 134 (22 self)
 Add to MetaCart
this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (Supertags) that impose complex constraints in a local context. The supertags are designed such that only those elements on which the lexical item imposes constraints appear within a given supertag. Further, each lexical item is associated with as many supertags as the number of different syntactic contexts in which the lexical item can appear. This makes the number of different descriptions for each lexical item much larger, than when the descriptions are less complex; thus increasing the local ambiguity for a parser. But this local ambiguity can be resolved by using statistical distributions of supertag cooccurrences collected from a corpus of parses. We have explored these ideas in the context of Lexicalized TreeAdjoining Grammar (LTAG) framework. The supertags in LTAG combine both phrase structure information and dependency information in a single representation. Supertag disambiguation results in a representation that is effectively a parse (almost parse), and the parser needs `only' combine the individual supertags. This method of parsing can also be used to parse sentence fragments such as in spoken utterances where the disambiguated supertag sequence may not combine into a single structure. 1 Introduction In this paper, we present a robust parsing approach called supertagging that integrates the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. The idea underlying the approach is that the ...
A Maximum Entropy Model for Prepositional Phrase Attachment
 In Proceedings of the ARPA Workshop on Human Language Technology
, 1994
"... this paper methods for constructing statistical models for computing the probability of attachment decisions. These models could be then integrated into scoring the probability of an overall parse. We present our methods in the context of prepositional phrase (PP) attachment. ..."
Abstract

Cited by 128 (3 self)
 Add to MetaCart
this paper methods for constructing statistical models for computing the probability of attachment decisions. These models could be then integrated into scoring the probability of an overall parse. We present our methods in the context of prepositional phrase (PP) attachment.
CommitteeBased Sampling For Training Probabilistic Classifiers
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... In many realworld learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
In many realworld learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. The method, committeebased sampling, evaluates the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set selected so far (MonteCarlo sampling). The method is particularly attractive because it evaluates the expected information gain from a training example implicitly, making the model both easy to implement and generally applicable. We further show how to apply committeebased sampling for training Hidden Markov Model classifiers, which are commonly used for complex classification tasks. The method was implemented and tested for ...
Intricacies of Collins’ parsing model
, 2003
"... This article documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins ’ (1999) thesis, this article contains all information necessary to duplicate Collins ’ benchmark results. Indeed, these asyetunpublished details account for an 11 % relat ..."
Abstract

Cited by 111 (1 self)
 Add to MetaCart
This article documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins ’ (1999) thesis, this article contains all information necessary to duplicate Collins ’ benchmark results. Indeed, these asyetunpublished details account for an 11 % relative increase in error from an implementation including all details to a cleanroom implementation of Collins ’ model. We also show a cleaner and equally wellperforming method for the handling of punctuation and conjunction and reveal certain other probabilistic oddities about Collins ’ parser. We not only analyze the effect of the unpublished details, but also reanalyze the effect of certain wellknown details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought. Finally, we perform experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the headword and its part of speech. 1.
PartofSpeech Tagging and Partial Parsing
 CorpusBased Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract

Cited by 96 (0 self)
 Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but nonzero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of handconstructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then handedited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Maltparser: A languageindependent system for datadriven dependency parsing
 In Proc. of the Fourth Workshop on Treebanks and Linguistic Theories
, 2005
"... ..."
Maltparser: A datadriven parsergenerator for dependency parsing
 In Proceedings of LREC
, 2006
"... We introduce MaltParser, a datadriven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows userdefined feature m ..."
Abstract

Cited by 84 (17 self)
 Add to MetaCart
We introduce MaltParser, a datadriven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows userdefined feature models, consisting of arbitrary combinations of lexical features, partofspeech features and dependency features. MaltParser is freely available for research and educational purposes and has been evaluated empirically on Swedish, English, Czech, Danish and Bulgarian. 1.
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
SimilarityBased Estimation of Word Cooccurrence Probabilities
 In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics
, 1994
"... In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the ..."
Abstract

Cited by 75 (8 self)
 Add to MetaCart
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on "most sim ilar" words. We describe a probabilistic word association model based on distributional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of Katz's backoff model. The similaritybased method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speechrecognition error.