Results 1 - 10
of
17
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
- COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Robust Stochastic Parsing Using the Inside-Outside Algorithm
, 1992
"... this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed (e.g. Sampson et al., 1989; Garside & Leech, 1985; Magerman & Marcus, 1991a,b), these appear most promising as they are computationally-tractable (in principle) and well-integrated with formal language / automata theory. The Viterbi algorithm and Baum-Welch algorithm are optimised algorithms (with polynomial computational complexity) which can be used in conjunction with stochastic regular grammars (finite-state automata, i.e. (hidden) markov models, Baum, 1972) and with probabilistic context-free grammars (Baker, 1982; Fujisaki
An Efficient Implementation of a New DOP Model
- In EACL
, 2003
"... Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which ou ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFGreduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank. Our results show an 11% relative reduction in error rate over previous models, and an average processing time of 3.6 seconds per WSJ sentence.
Efficient Disambiguation by means of Stochastic Tree Substitution Grammars
, 1994
"... In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementary-trees and therefore receives a distinct probability; the probability of the ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementary-trees and therefore receives a distinct probability; the probability of the parse is defined as the sum of the probabilities of all derivations which generate that parse. Therefore, some methods of Stochastic Contexts-Free Grammars (SCFGs), e.g. the Viterbi algorithm for finding the most probable parse (MPP) of an input sentence, are not applicable to STSGs. In this paper we study the problem of efficient disambiguation by means of STSGs under the Data Oriented Parsing model (DOP) [Bod, 1993c]. We present polynomial algorithms for computing the probability of a parse and the probability of an input sentence and its most probable derivation (MPD). In addition, we present a Viterbi-like optimization technique for search algorithms for the MPP. A major concern in desi...
Probabilistic Normalisation and Unpacking of Packed Parse Forests for Unification-based Grammars
- IN PROCEEDINGS OF THE AAAI FALL SYMPOSIUM ON PROBABILISTIC APPROACHES TO NATURAL LANGUAGE
, 1992
"... The research described below forms part of a wider programme to develop a practical parser for naturally-occurring natural language input which is capable of returning the n-best syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The research described below forms part of a wider programme to develop a practical parser for naturally-occurring natural language input which is capable of returning the n-best syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably as the highest ranked) from the exponential (in sentence length) syntactically legitimate possibilities (Church & Patil 1983), which can frequently run into the thousands with realistic sentences and grammars. We have opted to develop a domain-independent solution to this problem based on integrating statistical Markov modelling techniques, which offer the potential for rapid tuning to different sublanguages / corpora on the basis of supervised training, with linguistically-adequate grammatical (language) models, capable of returning analyses detailed enough to support semantic interpretation.
Efficient Parsing of DOP with PCFG-reductions
, 2003
"... Contents R. Bod, R. Scha and K. Sima'an PART I: The Basic Data-Oriented Parsing Model 1. A DOP model for phrase-structure trees R. Bod and R. Scha 2. Probability models for DOP 3. Encoding frequency information in stochastic parsing models J. Carroll and D. Weir PART II: Computational Is ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Contents R. Bod, R. Scha and K. Sima'an PART I: The Basic Data-Oriented Parsing Model 1. A DOP model for phrase-structure trees R. Bod and R. Scha 2. Probability models for DOP 3. Encoding frequency information in stochastic parsing models J. Carroll and D. Weir PART II: Computational Issues 1. Computational complexity of disambiguation under DOP 2. Parsing DOP with Monte Carlo techniques J. Chappelier and M. Rajman 3. Towards efficient Monte Carlo parsing 4. Efficient parsing of DOP with PCFG-reductions J. Goodman 5. An approximation of DOP through memory-based learning G. de Pauw 6. Compositional partial parsing by memory-based sequence learning I. Dagan and Y. Krymolowsky PART III: Richer Models 1. A head-driven data-oriented approach to lexical dependency 2. A DOP model for Lexical-Functional Grammar representations R. Bod and R. Kaplan 3. A data-driven approach to Head-driven Phrase-Structure G. Neumann 4. Tree-Adjoining Grammars and its applic
Snippet Search: a Single Phrase Approach to Text Access
- In Proceedings of the 1991 Joint Statistical Meetings. American Statistical Association
, 1991
"... this paper. In the worst case, the inner loop of this algorithm is executed ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
this paper. In the worst case, the inner loop of this algorithm is executed
Data-Oriented Language Processing -- An Overview
- CORPUSBASED METHODS IN LANGUAGE AND SPEECH PROCESSING
, 1997
"... Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of pre ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data-oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase-structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data-oriented processing approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.
The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars
- In Proceedings of the Seventh Conference of the European Chapter of the ACL
"... We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG) ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.

