Results 1  10
of
22
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with UnificationBased Grammars
 COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
Practical Unificationbased Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is wellsuited to practical NL parsing, and describes a bottomup active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Robust stochastic parsing using the insideoutside algorithm
 In: AAAI92 Workshop on Statistically Based NLP Techniques
, 1992
"... ..."
An Efficient Implementation of a New DOP Model
 In EACL
, 2003
"... Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which ou ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
(Show Context)
Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFGreduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank. Our results show an 11% relative reduction in error rate over previous models, and an average processing time of 3.6 seconds per WSJ sentence.
Efficient Parsing of DOP with PCFGreductions
, 2003
"... Contents R. Bod, R. Scha and K. Sima'an PART I: The Basic DataOriented Parsing Model 1. A DOP model for phrasestructure trees R. Bod and R. Scha 2. Probability models for DOP 3. Encoding frequency information in stochastic parsing models J. Carroll and D. Weir PART II: Computation ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Contents R. Bod, R. Scha and K. Sima'an PART I: The Basic DataOriented Parsing Model 1. A DOP model for phrasestructure trees R. Bod and R. Scha 2. Probability models for DOP 3. Encoding frequency information in stochastic parsing models J. Carroll and D. Weir PART II: Computational Issues 1. Computational complexity of disambiguation under DOP 2. Parsing DOP with Monte Carlo techniques J. Chappelier and M. Rajman 3. Towards efficient Monte Carlo parsing 4. Efficient parsing of DOP with PCFGreductions J. Goodman 5. An approximation of DOP through memorybased learning G. de Pauw 6. Compositional partial parsing by memorybased sequence learning I. Dagan and Y. Krymolowsky PART III: Richer Models 1. A headdriven dataoriented approach to lexical dependency 2. A DOP model for LexicalFunctional Grammar representations R. Bod and R. Kaplan 3. A datadriven approach to Headdriven PhraseStructure G. Neumann 4. TreeAdjoining Grammars and its applic
Augmenting a hidden Markov model for phrase dependent word tagging
 In DARPA Speech and Natural Language Workshop, a
, 1989
"... The paper describes refinements that are currently being investigated in a model for partofspeech assignment to words in unrestricted text. The model has the advantage that a pretagged training corpus is not required. Words are represented by equivalence classes to reduce the number of parameters ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The paper describes refinements that are currently being investigated in a model for partofspeech assignment to words in unrestricted text. The model has the advantage that a pretagged training corpus is not required. Words are represented by equivalence classes to reduce the number of parameters required and provide an essentially vocabularyindependent model. State chains are used to model selective higherorder conditioning in the model, which obviates the proliferation of parameters attendant in uniformly higherorder models. The structure of the state chains is based on both an analysis of errors and linguistic knowledge. Examples show how word dependency across phrases can be modeled.
Efficient Disambiguation by means of Stochastic Tree Substitution Grammars
, 1994
"... In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementarytrees and therefore receives a distinct probability; the probability of the ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementarytrees and therefore receives a distinct probability; the probability of the parse is defined as the sum of the probabilities of all derivations which generate that parse. Therefore, some methods of Stochastic ContextsFree Grammars (SCFGs), e.g. the Viterbi algorithm for finding the most probable parse (MPP) of an input sentence, are not applicable to STSGs. In this paper we study the problem of efficient disambiguation by means of STSGs under the Data Oriented Parsing model (DOP) [Bod, 1993c]. We present polynomial algorithms for computing the probability of a parse and the probability of an input sentence and its most probable derivation (MPD). In addition, we present a Viterbilike optimization technique for search algorithms for the MPP. A major concern in desi...
Probabilistic Normalisation and Unpacking of Packed Parse Forests for Unificationbased Grammars
 IN PROCEEDINGS OF THE AAAI FALL SYMPOSIUM ON PROBABILISTIC APPROACHES TO NATURAL LANGUAGE
, 1992
"... The research described below forms part of a wider programme to develop a practical parser for naturallyoccurring natural language input which is capable of returning the nbest syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
The research described below forms part of a wider programme to develop a practical parser for naturallyoccurring natural language input which is capable of returning the nbest syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably as the highest ranked) from the exponential (in sentence length) syntactically legitimate possibilities (Church & Patil 1983), which can frequently run into the thousands with realistic sentences and grammars. We have opted to develop a domainindependent solution to this problem based on integrating statistical Markov modelling techniques, which offer the potential for rapid tuning to different sublanguages / corpora on the basis of supervised training, with linguisticallyadequate grammatical (language) models, capable of returning analyses detailed enough to support semantic interpretation.
The Problem of Computing the Most Probable Tree in DataOriented Parsing and Stochastic Tree Grammars
 In Proceedings of the Seventh Conference of the European Chapter of the ACL
"... We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a dataoriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic treesubstitution grammar (STSG) ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a dataoriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic treesubstitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.
DataOriented Language Processing  An Overview
 CORPUSBASED METHODS IN LANGUAGE AND SPEECH PROCESSING
, 1997
"... Dataoriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of pre ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Dataoriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrencefrequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of dataoriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrasestructure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the dataoriented processing approach. Many of these models also employ labelled phrasestructure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.