Results 1  10
of
34
HeadDriven Statistical Models for Natural Language Parsing
, 1999
"... Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remar ..."
Abstract

Cited by 955 (16 self)
 Add to MetaCart
Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remarkable breadth and depth of their feedback. I had countless impromptu but in uential discussions with Jason Eisner, Dan Melamed and Adwait Ratnaparkhi in the LINC lab. They also provided feedback on many drafts of papers and thesis chapters. Paola Merlo pushed me to think about many new angles of the research. Dimitrios Samaras gave invaluable feedback on many portions of the work. Thanks to James Brooks for his contribution to the work that comprises chapter 5 of this thesis. The community of faculty, students and visitors involved with the Institute for Research in Cognitive Science at Penn provided an intensely varied and stimulating environment. I would like to thank them collectively. Some deserve special mention for discussions that contributed quite directly to this research: Breck Baldwin, Srinivas Bangalore, Dan
A MaximumEntropyInspired Parser
, 1999
"... We present a new parser for parsing down to Penn treebank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan dard" sections of ..."
Abstract

Cited by 821 (18 self)
 Add to MetaCart
We present a new parser for parsing down to Penn treebank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan dard" sections of the Wall Street Journal tree bank. This represents a 13% decrease in error rate over the best singleparser results on this corpus [9]. The major technical innova tion is the use of a "maximumentropyinspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's preterminal before guessing the lexical head.
Statistical Parsing with a Contextfree Grammar and Word Statistics
, 1997
"... We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previou ..."
Abstract

Cited by 366 (17 self)
 Add to MetaCart
We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance. Introduction We present a statistical parser that induces its grammar and probabilities from a handparsed corpus (a treebank). Parsers induced from corpora are of interest both as simply exercises in machine learning and also because they are often the best parsers obtainable by any method. That is, if one desires a parser that produces trees in the treebank ...
PCFG Models of Linguistic Tree Representations
 Computational Linguistics
, 1998
"... This paper points out that the Penn lI treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFGbased parser's average precision and recall by around 8%, or approximately half of the performance dif ..."
Abstract

Cited by 211 (9 self)
 Add to MetaCart
This paper points out that the Penn lI treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFGbased parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broadcoverage parsers available today. This performance variation comes about because any PCFG, and hence the corpus of trees from which the PCFG is induced, embodies independence assumptions about the distribution of words and phrases. The particular independence assumptions implicit in a tree representation can be studied theoretically and investigated empirically by means of a tree transformation / detransformation process
Learning to Parse Database Queries Using Inductive Logic Programming
 In Proceedings of the Thirteenth National Conference on Artificial Intelligence
, 1996
"... This paper presents recent work using the Chill parser acquisition system to automate the construction of a naturallanguage interface for database queries. Chill treats parser acquisition as the learning of searchcontrol rules within a logic program representing a shiftreduce parser and uses tec ..."
Abstract

Cited by 105 (18 self)
 Add to MetaCart
This paper presents recent work using the Chill parser acquisition system to automate the construction of a naturallanguage interface for database queries. Chill treats parser acquisition as the learning of searchcontrol rules within a logic program representing a shiftreduce parser and uses techniques from Inductive Logic Programming to learn relational control knowledge. Starting with a general framework for constructing a suitable logical form, Chill is able to train on a corpus comprising sentences paired with database queries and induce parsers that map subsequent sentences directly into executable queries. Experimental results with a complete databasequery application for U.S. geography show that Chill is able to learn parsers that outperform a preexisting, handcrafted counterpart. These results demonstrate the ability of a corpusbased system to produce more than purely syntactic representations. They also provide direct evidence of the utility of an empirical approach at...
Parameter learning of logic programs for symbolicstatistical modeling
 Journal of Artificial Intelligence Research
, 2001
"... We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distributio ..."
Abstract

Cited by 92 (19 self)
 Add to MetaCart
We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, thatrunsfora class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the BaumWelch algorithm for HMMs, the InsideOutside algorithm for PCFGs, and the one for singly connected Bayesian networks that have beendeveloped independently in each research eld. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can signi cantly outperform the InsideOutside algorithm. 1.
Figures of Merit for BestFirst Probabilistic Chart Parsing
 Computational Linguistics
, 1996
"... Bestfirst parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
Bestfirst parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While several parsers described in the literature have used such techniques, there is no published data on their efficacy, much less attempts to judge their relative merits. We propose and evaluate several figures of merit for bestfirst parsing.
The LinGO Redwoods Treebank  Motivation and Preliminary Applications
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium to largescale treebanks exist for English (and for other major languages), preexisting publicly available resources exhibit the following limitations: (i) annotation is m ..."
Abstract

Cited by 68 (19 self)
 Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium to largescale treebanks exist for English (and for other major languages), preexisting publicly available resources exhibit the following limitations: (i) annotation is monostratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hardwires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year or decadelong) evolution of a largescale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an opensource license.
Generalized queries on probabilistic contextfree grammars
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Abstractâ€”Probabilistic contextfree grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a contextfree language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Abstractâ€”Probabilistic contextfree grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a contextfree language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been developed and applied to a variety of patternrecognition problems. We extend the class of queries that can be answered in several ways: (1) allowing missing tokens in a sentence or sentence fragment, (2) supporting queries about intermediate structure, such as the presence of particular nonterminals, and (3) flexible conditioning on a variety of types of evidence. Our method works by constructing a Bayesian network to represent the distribution of parse trees induced by a given PCFG. The network structure mirrors that of the chart in a standard parser, and is generated using a similar dynamicprogramming approach. We present an algorithm for constructing Bayesian networks from PCFGs, and show how queries or patterns of queries on the network correspond to interesting queries on PCFGs. The network formalism also supports extensions to encode various context sensitivities within the probabilistic dependency structure. Index Termsâ€”Probabilistic contextfree grammars, Bayesian networks.
Encoding Frequency Information In Lexicalized Grammars
, 1997
"... We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical a ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogonal approaches for backing off probability estimates to cope with the large number of parameters involved. Keywords: Probabilistic parsing, lexicalized grammars 1. INTRODUCTION When performing a derivation with a grammar it is usually the case that, at certain points in the derivation process, the grammar licenses several alternative ways of continuing with the derivation. In the case of contextfree grammar (CFG) such nondeterminism arises when there are several productions for the nonterminal that is being rewritten. Frequency information associated with the grammar may be used to assign a pr...