Results 1 - 10
of
27
Head-Driven Statistical Models for Natural Language Parsing
, 2003
"... This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down ..."
Abstract
-
Cited by 780 (13 self)
- Add to MetaCart
This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models
Maximum Entropy Models for Natural Language Ambiguity Resolution
, 1998
"... The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope th ..."
Abstract
-
Cited by 167 (1 self)
- Add to MetaCart
The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope that Ihave kept the good ideas in this thesis, and left the bad ideas out! Iwould like toacknowledge the following people for their contribution to my education: I thank my advisor Mitch Marcus, who gave me the intellectual freedom to pursue what I believed to be the best way to approach natural language processing, and also gave me direction when necessary. I also thank Mitch for many fascinating conversations, both personal and professional, over the last four years at Penn. I thank all of my thesis committee members: John La erty from Carnegie Mellon University, Aravind Joshi, Lyle Ungar, and Mark Liberman, for their extremely valuable suggestions and comments about my thesis research. I thank Mike Collins, Jason Eisner, and Dan Melamed, with whom I've had many stimulating and impromptu discussions in the LINC lab. Iowe them much gratitude for their valuable feedback onnumerous rough drafts of papers and thesis chapters.
Learning to Parse Natural Language with Maximum Entropy Models
, 1999
"... This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the pa ..."
Abstract
-
Cited by 136 (0 self)
- Add to MetaCart
This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art.
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Generative Models for Statistical Parsing with Combinatory Categorial Grammar
- In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics
, 2002
"... This paper compares a number of generative probability models for a widecoverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normal-form derivations. According to an evaluation of unlabel ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
This paper compares a number of generative probability models for a widecoverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normal-form derivations. According to an evaluation of unlabeled word-word dependencies, our best model achieves a performance of 89.9%, comparable to the figures given by Collins (1999) for a linguistically less expressive grammar. In contrast to Gildea (2001), we find a significant improvement from modeling wordword dependencies.
Probabilistic CFG with Latent Annotations
, 2005
"... This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFG-LA model using an E ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFG-LA model using an EM-algorithm. Because exact parsing with a PCFG-LA is NP-hard, several approximations are described and empirically compared. In experiments using the Penn WSJ corpus, our automatically trained model gave a performance of 86.6 % (F ¥ , sentences ¦ 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.
Bilexical Grammars And Their Cubic-Time Parsing Algorithms
- IN: NEW DEVELOPMENTS IN NATURAL LANGUAGE PARSING
, 2000
"... This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism ’ has been a theme of much current work in parsing. The new formalism can be used t ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism ’ has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modification yields link grammars. Its scoring approach is compatible with a wide variety of probability models. The obvious parsing algorithm for bilexical grammars (used by most previous authors) takes time O(n^5). A more efficient O(n³) method is exhibited. The new algorithm has been implemented and used in a large parsing experiment (Eisner, 1996b). We also give a useful extension to the case where the parser must undo a stochastic transduction that has altered the input.
Logical hidden markov models
- Journal of Artificial Intelligence Research
, 2006
"... Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evalu ..."
Abstract
-
Cited by 33 (10 self)
- Add to MetaCart
Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics. 1.
The SuperARV language model: Investigating the effectiveness of tightly integrating multiple knowledge sources
- in Proceedings of Conference of Empirical Methods in Natural Language Processing
, 2002
"... A new almost-parsing language model incorporating multiple knowledge sources that is based upon the concept of Constraint Dependency Grammars is presented in this paper. Lexical features and syntactic constraints are tightly integrated into a uniform linguistic structure called a SuperARV that is as ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
A new almost-parsing language model incorporating multiple knowledge sources that is based upon the concept of Constraint Dependency Grammars is presented in this paper. Lexical features and syntactic constraints are tightly integrated into a uniform linguistic structure called a SuperARV that is associated with a word in the lexicon. The Super-ARV language model reduces perplexity and word error rate compared to trigram, part-of-speech-based, and parser-based language models. The relative contributions of the various knowledge sources to the strength of our model are also investigated by using constraint relaxation at the level of the knowledge sources. We have found that although each knowledge source contributes to language model quality, lexical features are an outstanding contributor when they are tightly integrated with word identity and syntactic constraints. Our investigation also suggests possible reasons for the reported poor performance of several probabilistic dependency grammar models in the literature. 1
The Acquisition of Grammar in an Evolving Population of Language Agents
- of Art. Intelligence (Special Issue: Machine Intelligence
, 1999
"... Human language acquisition, and in particular the acquisition of grammar, is a partially-canalized, strongly-biased but robust and e cient procedure. For example, children prefer to induce lexically compositional rules (e.g. Wanner and Gleitman, 1982) despite the use, in every attested human languag ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Human language acquisition, and in particular the acquisition of grammar, is a partially-canalized, strongly-biased but robust and e cient procedure. For example, children prefer to induce lexically compositional rules (e.g. Wanner and Gleitman, 1982) despite the use, in every attested human language, of constructions, such as morphological negation or non-compositional idioms. And, most parameters of grammatical variation set during language acquisition appear to have default or so-called unmarked values retained in the absence of robust counter-evidence (e.g. Bickerton, 1984 � Hyams, 1986 � Lightfoot, 1992). A variety of explanations have been o ered for the emergence of a partially-innate language acquisition device (LAD) with such properties based on saltation (Berwick, 1998 � Bickerton, 1990, 1998) or genetic assimilation (Pinker and Bloom, 1990). But none provide a coherent detailed account of both the emergence and maintenance of a LAD in an evolving population. The account proposed here is that a minimal LAD emerged via recruitment of general-purpose (Bayesian) learning mechanisms (e.g. Staddon, 1988 � Cosmides and Tooby, 1996) to a speci cally-linguistic mental representation capable of expressing mappings from the `language of thought ' to realizable, essentially linearized, encodings of propositions of the language of thought. However, the selective pressure favouring such adevelopment, and its subsequent maintenance and re nement, is only coherent given a coevolutionary scenario in which a (proto)language supporting successful communication within a population had already itself evolved on a historical timescale (e.g. Hurford, 1987 � Kirby, 1998 � Steels, 1998) and continued to coevolve with the LAD (e.g. Briscoe, 1997, 1998a,b). The model of the LAD presented here builds on and extends previous work in the parameter setting

