Results 1 -
5 of
5
A Corpus-Based Approach to Semantic Interpretation
- Proceedings Ninth Amsterdam Colloquium
, 1994
"... ..."
Back-off as Parameter Estimation for DOP models
, 2002
"... Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimat ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a rescaling factor to correct for tree size effects. Both estimators, however, present biases. Moreover, Bod's estimator has been shown to be inconsistent (Johnson, 2002), meaning that the probability estimates hypothesized by the model do not approach the true probabilities that generated the data as the sample size grows. In this thesis, we implement a new estimation procedure that tackles the shortcomings of the two previous methods. The main idea is to treat derivation events not as disjoint, but as interrelated in a hierarchical cascade of parse tree derivations. We show that this new estimator -- called the Back-Off DOP (BO-DOP) estimator -- outperforms both previous models. We tested it on the OVIS treebank, a Dutch language, speech-based system, and report error reductions of up to 11.4% and 15% when compared to, respectively, Bod's and Bonnema's estimators.
Extending DOP1 with the Insertion Operation
, 2000
"... In Data-Oriented Parsing (DOP) an annotated corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This thesis presents a model in which the DOP1 model as developed by Bod is enrich ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In Data-Oriented Parsing (DOP) an annotated corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This thesis presents a model in which the DOP1 model as developed by Bod is enriched with the insertion operation, thus yielding a stochastic Tree Insertion Grammar (TIG) instead of a Stochastic Tree Substitution Grammar. TIG is related to Tree-Adjoining Grammar. Since the adjunction permitted in TIG is restricted, TIG can embed the elegance of the analyses found in Tree-Adjoining Grammar without allowing for context sensitive languages. In addition to presenting the model, the thesis reports on some experiments for measuring the disambiguation accuracy of the model on the ATIS domain. Furthermore, the thesis shows that the Monte Carlo sampling algorithm used in DOP1 to select the most probable parse from the parse forest does not always sample a unique random derivation. A more efficient correct algorithm has been developed.
A simple DOP model for constituency parsing of Italian sentences
"... Abstract. We present a simplified Data-Oriented Parsing (DOP) formalism for learning the constituency structure of Italian sentences. In our approach we try to simplify the original DOP methodology by constraining the number and type of fragments we extract from the training corpus. We provide some ..."
Abstract
- Add to MetaCart
Abstract. We present a simplified Data-Oriented Parsing (DOP) formalism for learning the constituency structure of Italian sentences. In our approach we try to simplify the original DOP methodology by constraining the number and type of fragments we extract from the training corpus. We provide some examples of the types of constructions that occur more often in the treebank, and quantify the performance of our grammar on the constituency parsing task. Keywords: Data-Oriented Parsing, Tree substitution grammar, statistical model, fragments, kernel methods.

