Results 1 - 10
of
18
An Efficient Implementation of a New DOP Model
- In EACL
, 2003
"... Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which ou ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFGreduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank. Our results show an 11% relative reduction in error rate over previous models, and an average processing time of 3.6 seconds per WSJ sentence.
What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?
- IN PROCEEDINGS OF ACL 2001
, 2001
"... We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previo ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank (a precision of 90.8% and a recall of 90.6%). We isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy.
Seeing the Wood for the Trees: Data-Oriented Translation
- In Machine Translation Summit IX
, 2003
"... Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, 2000b] were small in scale because important advances in DOP technology were not incorporated into the translation model. Despite this, related work [Way, 1999, Way, 2003a, Way, 2003b] reports that DOT models are viable in that solutions to ‘hard ’ translation cases are readily available. However, it has not been shown to date that DOT models scale to larger datasets. In this work, we describe a novel DOT system, inspired by recent advances in DOP parsing technology. We test our system on larger, more complex corpora than have been used heretofore, and present both automatic and human evaluations which show that high quality translations can be achieved at reasonable speeds.
Data-Oriented Models of Parsing and Translation
, 2005
"... A dissertation submitted in fulfilment of the requirements for the award of ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
A dissertation submitted in fulfilment of the requirements for the award of
Context-Sensitive Spoken Dialogue Processing with the DOP Model
- Natural Language Engineering
, 1999
"... We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates ov ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into contextdependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
An Empirical Evaluation of LFG-DOP
- In Proceedings of the 19th International Conference on Computational Linguistics
, 2000
"... This paper presents an empirical assessment of the LFG-DOP model introduced by Bod & Kaplan (1998). The parser we describe uses fragments from LFG-annotated sentences to parse new sentences and Monte Carlo techniques to compute the most probable parse. While our main goal is to test Bod & Kaplan's m ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper presents an empirical assessment of the LFG-DOP model introduced by Bod & Kaplan (1998). The parser we describe uses fragments from LFG-annotated sentences to parse new sentences and Monte Carlo techniques to compute the most probable parse. While our main goal is to test Bod & Kaplan's model, we will also test a version of LFG-DOP which treats generalized fragments as previously unseen events. Experiments with the Verbmobil and Homecentre corpora show that our version of LFG-DOP outperforms Bod & Kaplan's model, and that LFG's functional information improves the parse accuracy of tree structures. 1
An Improved Parser for Data-Oriented Lexical-Functional Analysis
- In Proceedings of the 38th Conference of the Association for Computational Linguistics
"... We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP h ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP hypothesis which states that parse accuracy increases with increasing fragment size is confirmed for LFG-DOP; (3) LFG-DOP's relative frequency estimator performs worse than a discounted frequency estimator; and (4) LFG-DOP significantly outperforms Tree-DOP if evaluated on tree structures only. 1
Bootstrapping Structure using Similarity
- Computational Linguistics in the Netherlands
, 1999
"... In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The met ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The method works on pairs of unstructured sentences or sentences partially bracketed by the algorithm that have one or more words in common. It finds parts of sentences that are interchangeable (i.e. the parts of the sentences that are different in both sentences) . These parts are taken as possible constituents of the same type. While this corresponds to the basic bootstrapping step of the algorithm, further structure may be learned from comparison with other (similar) sentences. We used this method for bootstrapping structure from the flat sentences of the Penn Treebank ATIS corpus, and compared the resulting structured sentences to the structured sentences in the ATIS corpus. Similarly, the alg...
Do All Fragments Count?
- Natural Language Engineering
, 2003
"... We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over p ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank. We isolate a number of dependency relations which previous models neglect but which contribute to higher accuracy. We show that the history of statistical parsing models displays a tendency towards using more and larger fragments from training data.
A Data-Oriented Parsing Model for Lexical-Functional Grammar
- In Data-Oriented Parsing, ed. by Rens Bod, Remko Scha, & Khalil Sima’an
, 2003
"... Data-Oriented Parsing (DOP) models of natural language propose that human language processing works with representations of concrete past language experiences rather than with abstract linguistic rules. These models operate by decomposing the given representations into fragments and recomposing t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Data-Oriented Parsing (DOP) models of natural language propose that human language processing works with representations of concrete past language experiences rather than with abstract linguistic rules. These models operate by decomposing the given representations into fragments and recomposing those pieces to analyze new utterances. A probability model is used to select from all possible analyses of an utterance the most likely one. Previous DOP models were based on simple tree representations that neglect grammatical functions and syntactic features (Tree-DOP). In this paper, we present a new DOP model based on the more articulated representations of Lexical-Functional Grammar theory (LFG-DOP). LFG-DOP triggers a new, corpus-based notion of grammaticality, and an interestingly different class of probability models. An empirical evaluation of the model shows that larger as well as richer fragments improve performance. Finally, we go into some of the conceptual implications of our approach. 1

