Results 1  10
of
35
Computational Complexity of Probabilistic Disambiguation by means of TreeGrammars
, 1996
"... This paper studies the compntational complexity of dlsambiguation under probabilistic treegrammars as in (Bod, 1992; Schabes and Waters, 1993). It presents a proof that the following problems are NPhard: computing the Most Probable Parse frmn a sentence or from a wordgraph, and computing t ..."
Abstract

Cited by 100 (7 self)
 Add to MetaCart
This paper studies the compntational complexity of dlsambiguation under probabilistic treegrammars as in (Bod, 1992; Schabes and Waters, 1993). It presents a proof that the following problems are NPhard: computing the Most Probable Parse frmn a sentence or from a wordgraph, and computing the Most Pro'oable Sentence (MPS) from a word graph. The NPhardness of computing the MPS from a wordgraph also holds for Stochastic ContextFree Gram mars (SCFGs).
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
Efficient Algorithms for Parsing the DOP Model
, 1996
"... Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that mus ..."
Abstract

Cited by 58 (4 self)
 Add to MetaCart
Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo p arsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model toga small, equivalent probabilistic contextfree grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct con stituents, rather than the probability of a correct parse tree. Using ithe optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is compara ble to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.
Building a TreeBank of Modern Hebrew Text
, 2001
"... This paper describes the process of building the first treebank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parse ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
This paper describes the process of building the first treebank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated treebank was explored.
Efficient Disambiguation by means of Stochastic Tree Substitution Grammars
, 1994
"... In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementarytrees and therefore receives a distinct probability; the probability of the ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
In Stochastic Tree Substitution Grammars (STSGs), a parse(tree) of an input sentence can be generated by (exponentially) many derivations. Each of these derivations is the result of a different combination of STSG elementarytrees and therefore receives a distinct probability; the probability of the parse is defined as the sum of the probabilities of all derivations which generate that parse. Therefore, some methods of Stochastic ContextsFree Grammars (SCFGs), e.g. the Viterbi algorithm for finding the most probable parse (MPP) of an input sentence, are not applicable to STSGs. In this paper we study the problem of efficient disambiguation by means of STSGs under the Data Oriented Parsing model (DOP) [Bod, 1993c]. We present polynomial algorithms for computing the probability of a parse and the probability of an input sentence and its most probable derivation (MPD). In addition, we present a Viterbilike optimization technique for search algorithms for the MPP. A major concern in desi...
Treegram Parsing Lexical Dependencies and Structural Relations
, 2000
"... This paper explores the kinds of probabilistic relations that are important in syntactic disambiguation. It proposes that two widely used kinds of relations, lexical dependencies and structural relations, have complementary disambiguation capabilities. It presents a new model based on struc ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
This paper explores the kinds of probabilistic relations that are important in syntactic disambiguation. It proposes that two widely used kinds of relations, lexical dependencies and structural relations, have complementary disambiguation capabilities. It presents a new model based on structural relations, the Treegram model, and reports experiments showing that structural relations should benet from enrichment by lexical dependencies. 1 Introduction Headlexicalization currently pervades in the parsing literature e.g. (Eisner, 1996; Collins, 1997; Charniak, 1999). This method extends every treebank nonterminal with its headword: the model is trained on this head lexicalized treebank. Head lexicalized models extract probabilistic relations between pairs of lexicalized nonterminals (\bilexical dependencies"): every relation is between a parent node and one of its children in a parsetree. Bilexical dependencies generate parsetrees for input sentences via Markov proces...
Backoff as Parameter Estimation for DOP models
, 2002
"... DataOriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimat ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
DataOriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a rescaling factor to correct for tree size effects. Both estimators, however, present biases. Moreover, Bod's estimator has been shown to be inconsistent (Johnson, 2002), meaning that the probability estimates hypothesized by the model do not approach the true probabilities that generated the data as the sample size grows. In this thesis, we implement a new estimation procedure that tackles the shortcomings of the two previous methods. The main idea is to treat derivation events not as disjoint, but as interrelated in a hierarchical cascade of parse tree derivations. We show that this new estimator  called the BackOff DOP (BODOP) estimator  outperforms both previous models. We tested it on the OVIS treebank, a Dutch language, speechbased system, and report error reductions of up to 11.4% and 15% when compared to, respectively, Bod's and Bonnema's estimators.
The Problem of Computing the Most Probable Tree in DataOriented Parsing and Stochastic Tree Grammars
 In Proceedings of the Seventh Conference of the European Chapter of the ACL
"... We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a dataoriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic treesubstitution grammar (STSG) ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a dataoriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic treesubstitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.
DataOriented Models of Parsing and Translation
, 2005
"... A dissertation submitted in fulfilment of the requirements for the award of ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
A dissertation submitted in fulfilment of the requirements for the award of
ContextSensitive Spoken Dialogue Processing with the DOP Model
 Natural Language Engineering
, 1999
"... We show how the DOP model can be used for fast and robust contextsensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates ov ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
We show how the DOP model can be used for fast and robust contextsensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to contextsensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input wordgraph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into contextdependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the contextsensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.