Results 1 
8 of
8
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with UnificationBased Grammars
 COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
Computation of the probability of initial substring generation by stochastic contextfree grammars
 Computational Linguistics
, 1991
"... Speech recognition language models are based on probabilities P(Wk+I = v [ WlW2~..., Wk) that the next word Wk+l will be any particular word v of the vocabulary, given that the word sequence Wl, w2,..., Wk is hypothesized to have been uttered in the past. If probabilistic contextfree grammars are t ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
Speech recognition language models are based on probabilities P(Wk+I = v [ WlW2~..., Wk) that the next word Wk+l will be any particular word v of the vocabulary, given that the word sequence Wl, w2,..., Wk is hypothesized to have been uttered in the past. If probabilistic contextfree grammars are to be used as the basis of the language model, it will be necessary to compute the probability that successive application of the grammar rewrite rules (beginning with the sentence start symbol s) produces a word string whose initial substring is an arbitrary sequence wl, w2,..., Wk+l. In this paper we describe a new algorithm that achieves the required computation in at most a constant times k3steps. 1.
Practical Unificationbased Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is wellsuited to practical NL parsing, and describes a bottomup active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
GLR*: A Robust GrammarFocused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extragrammaticality: noise...
Probabilistic Language Modeling for Generalized LR Parsing
, 1998
"... In this thesis, we introduce probabilistic models to rank the likelihood of resultant parses within the GLR parsing framework. Probabilistic models can also bring about the benefit of reduction of search space, if the models allow prefix probabilities for partial parses. In devising the models, we c ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In this thesis, we introduce probabilistic models to rank the likelihood of resultant parses within the GLR parsing framework. Probabilistic models can also bring about the benefit of reduction of search space, if the models allow prefix probabilities for partial parses. In devising the models, we carefully observe the nature of GLR parsing, one of the most efficient parsing algorithms in existence, and formalize two probabilistic models with the appropriate use of the parsing context. The context in GLR parsing is provided by the constraints afforded by contextfree grammars in generating an LR table (global context), and the constraints of adjoining preterminal symbols (local ngram context).
A Probabilistic Chunker
 In: Proceedings of ROCLING VI
, 1993
"... This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk version of Brown Corpus, underlying bigram language model. The experiment is evaluated by outside test and inside test. The preliminary results show the chunker has more than 98% chunk correct rate and 94% sentence correct rate in outside test, and 99% chunk correct rate and 97% sentence correct rate in inside test. The simple but effective chunker design has shown to be promising and can be extended to complete parsing and many applications. 1. Introduction A probabilistic approach to natural language processing is not new [1]. Recently, many parsers based on this line have been proposed [29]. Garside and Leech [2] apply the constituentlikehood grammar of Atwell [10] to probabilist...
Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing
, 2002
"... Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas present ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas presented here stronger. My dissertation committee members from Penn: Mitch Marcus, Mark Liberman and Martha Palmer provided questions whose answers shaped my dissertation proposal into the finished form in front of you. Many thanks to my academic collaborators; the work on prefix probabilities was done with MarkJan Nederhof and Giorgio Satta when they visited IRCS in 1998, the work on subcategorization frame learning was done in collaboration with Daniel Zeman when he visited IRCS in 2000. Thanks to B. Srinivas whose previous work provided the path to the experimental work in this dissertation. Thanks also to Paola Merlo and Suzanne Stevenson for discussions on their work on verb alternation classes. I also acknowledge the help of Woottiporn Tripasai in the extension of their work presented in this dissertation. Thanks to
Statistical Parsing Algorithms for Lexicalized Tree Adjoining Grammars
"... The goal of this dissertation is twofold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The goal of this dissertation is twofold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is to provide a simple, mathematically and linguistically wellformed probabilistic framework for statistical parsing. The practical results in parsing and estimation of probabilistic TAGs are developed with a view towards an increasingly unsupervised approach to the training of statistical parsers and language models. In particular, this proposal contains the following results: An algorithm for determining deficiency in a generative model for probabilistic TAGs. Anovel chart based headcorner parsing algorithm for probabilistic TAGs. A probability model for statistical parsing and a cotraining method for training this parser which combines labeled and unlabeled data. An algorithm for computing prefix probabilities which can be used to predict the word most likely to occur after an initial substring of the input. The proposed work can be summarized in the following points: A separate evaluation of the cotraining algorithm on a larger set of labeled and unlabeled data, in addition to the evaluation presented in this proposal. An evaluation of the pre x probability algorithm by comparing it with a trigram language model. An extension of techniques in learning subcategorization information and verb classes to produce TAG lexicons which can be directly used to improve performance of the cotraining algorithm.