Results 1  10
of
70
HeadDriven Statistical Models for Natural Language Parsing
, 1999
"... Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remar ..."
Abstract

Cited by 955 (16 self)
 Add to MetaCart
Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remarkable breadth and depth of their feedback. I had countless impromptu but in uential discussions with Jason Eisner, Dan Melamed and Adwait Ratnaparkhi in the LINC lab. They also provided feedback on many drafts of papers and thesis chapters. Paola Merlo pushed me to think about many new angles of the research. Dimitrios Samaras gave invaluable feedback on many portions of the work. Thanks to James Brooks for his contribution to the work that comprises chapter 5 of this thesis. The community of faculty, students and visitors involved with the Institute for Research in Cognitive Science at Penn provided an intensely varied and stimulating environment. I would like to thank them collectively. Some deserve special mention for discussions that contributed quite directly to this research: Breck Baldwin, Srinivas Bangalore, Dan
SELECTION AND INFORMATION: A CLASSBASED APPROACH TO LEXICAL RELATIONSHIPS
, 1993
"... Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theo ..."
Abstract

Cited by 235 (8 self)
 Add to MetaCart
Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theory of (Katz and Fodor, 1964), a predicate associates a set of defining features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the KatzFodor theory (e.g. (JohnsonLaird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge. This provides a better match for the empirical phenomena, but it opens up a different problem: if selectional constraints are the same as inferences in general, then accounting for them will require a much more complete understanding of knowledge representation and inference than we have at present. The problem, then, is this: how can a theory of selectional constraints be elaborated without first having either an empirically adequate theory of defining features or a comprehensive theory of inference? In this dissertation, I suggest that an answer to this question lies in the representation of conceptual
A Probabilistic Model of Lexical and Syntactic Access and Disambiguation
 COGNITIVE SCIENCE
, 1995
"... The problems of access  retrieving linguistic structure from some mental grammar  and disambiguation  choosing among these structures to correctly parse ambiguous linguistic input  are fundamental to language understanding. The literature abounds with psychological results on lexical access, ..."
Abstract

Cited by 138 (11 self)
 Add to MetaCart
The problems of access  retrieving linguistic structure from some mental grammar  and disambiguation  choosing among these structures to correctly parse ambiguous linguistic input  are fundamental to language understanding. The literature abounds with psychological results on lexical access, the access of idioms, syntactic rule access, parsing preferences, syntactic disambiguation, and the processing of gardenpath sentences. Unfortunately, it has been difficult to combine models which account for these results to build a general, uniform model of access and disambiguation at the lexical, idiomatic, and syntactic levels. For example psycholinguistic theories of lexical access and idiom access and parsing theories of syntactic rule access have almost no commonality in methodology or coverage of psycholinguistic data. This paper presents a single probabilistic algorithm which models both the access and disambiguation of linguistic knowledge. The algorithm is based on a parallel parser which ranks constructions for access, and interpretations for disambiguation, by their conditional probability. Lowranked constructions and interpretations are pruned through beamsearch; this pruning accounts, among other things, for the gardenpath effect. I show that this motivated probabilistic treatment accounts for a wide variety of psycholinguistic results, arguing for a more uniform representation of linguistic knowledge and for the use of probabilisticallyenriched grammars and interpreters as models of human knowledge of and processing of language.
Computational Complexity of Probabilistic Disambiguation by means of TreeGrammars
, 1996
"... This paper studies the compntational complexity of dlsambiguation under probabilistic treegrammars as in (Bod, 1992; Schabes and Waters, 1993). It presents a proof that the following problems are NPhard: computing the Most Probable Parse frmn a sentence or from a wordgraph, and computing t ..."
Abstract

Cited by 100 (7 self)
 Add to MetaCart
This paper studies the compntational complexity of dlsambiguation under probabilistic treegrammars as in (Bod, 1992; Schabes and Waters, 1993). It presents a proof that the following problems are NPhard: computing the Most Probable Parse frmn a sentence or from a wordgraph, and computing the Most Pro'oable Sentence (MPS) from a word graph. The NPhardness of computing the MPS from a wordgraph also holds for Stochastic ContextFree Gram mars (SCFGs).
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
Statistical methods and linguistics
 THE BALANCING ACT: COMBINING SYMBOLIC AND STATISTICAL APPROACHES TO LANGUAGE
, 1996
"... In the space of the last ten years, statistical methods have gone from being virtually unknown in computational linguistics to being a fundamental given. In 1996, no one can profess to be a computational linguist without a passing knowledge of statistical methods. HMM's are as de rigeur as LR tables ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
In the space of the last ten years, statistical methods have gone from being virtually unknown in computational linguistics to being a fundamental given. In 1996, no one can profess to be a computational linguist without a passing knowledge of statistical methods. HMM's are as de rigeur as LR tables, and anyone who cannot at least use the terminology persuasively risks being mistaken for kitchen help at the ACL banquet. More seriously, statistical techniques have brought signi cant advances in broadcoverage language processing. Statistical methods have made real progress possible on a number of issues that had previously stymied attempts to liberate systems from toy domains � issues that include disambiguation, error correction, and the induction of the sheer volume of information requisite for handling unrestricted text. And the sense of progress has generated a great deal of enthusiasm for statistical methods in computational linguistics. However, this enthusiasm has not been catching in linguistics proper. It is always dangerous to generalize about linguists, but I think it is fair to say
Automated Extraction Of Tags From The Penn Treebank
, 2000
"... The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large cor ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this difficulty. We extract different LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English.