Results 1 -
8 of
8
A probabilistic corpus-driven model for lexical-functional analysis
- Proceedings COLING-ACL'98
, 1998
"... rens.bod @ let.uva.nl Wc develop a l)ata-Oricntcd Parsing (DOP) model based on the syntactic representations of Lexical-f;unctional Grammar (LFG). We start by sum-marizing the original DOP model for tree represen-tations and then show how it can be extended with corresponding functional structures. ..."
Abstract
-
Cited by 56 (15 self)
- Add to MetaCart
rens.bod @ let.uva.nl Wc develop a l)ata-Oricntcd Parsing (DOP) model based on the syntactic representations of Lexical-f;unctional Grammar (LFG). We start by sum-marizing the original DOP model for tree represen-tations and then show how it can be extended with corresponding functional structures. The resulting LFG-DOP model triggers a new, corpus-based notion of grammaticality, and its probability models exhibit interesting behavior with respect to specificity and the interpretation of ill-formed strings. 1.
A DOP Model for Semantic Interpretation
- Proceedings ACL/EACL-97
, 1997
"... In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, usi ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annota- tions such as the Penn Tree-bank. If a cor- pus with semantically annotated sentences is used, the same approach can also gen- erate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic inter- pretation algorithm was tested on two semantically annotated corpora: the English ATIS corpus and the Dutch OVIS corpus.
Data-Oriented Language Processing -- An Overview
- CORPUSBASED METHODS IN LANGUAGE AND SPEECH PROCESSING
, 1997
"... Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of pre ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data-oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase-structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data-oriented processing approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.
Context-Sensitive Spoken Dialogue Processing with the DOP Model
- Natural Language Engineering
, 1999
"... We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates ov ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into contextdependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
Spoken Dialogue Interpretation with the DOP Model
, 1998
"... We show how the DOP model can be used for fast and robust processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary teleph ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO 1 Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into context-dependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
Efficient Algorithms for Parsing the DOP Model? -- A Reply to Joshua Goodman
- IN PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING
, 1996
"... ..."
Two Questions about Data-Oriented Parsing
- IN PROCEEDINGS FOURTH WORKSHOP ON VERY LARGE CORPORA
, 1996
"... In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results on unedited data are worse than on cleaned-up data, although still very competitive if compared to other models. As to the parsing of word strings, we show that the hardness of the problem does not so much depend on unknown words, but on previously unseen lexical categories of known words. We give a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method is of general interest since it shows that good performance can be obtained without the use of a part-of- speech tagger. To the best of our knowledge, our method outperforms other statistical parsers tested on Penn Treebank word strings.
Efficient Algorithms for Parsing the DOP Model? A Reply to
, 1996
"... This note is a reply to Joshua Goodman's paper "Efficient Algorithms for Parsing the DOP Model " (Goodman, 1996). In his paper, Goodman makes a number of claims about (my work on) the Data-Oriented Parsing model (Bod, 1992-1996). This note shows that some of these claims must be mistaken. 1. Goodman ..."
Abstract
- Add to MetaCart
This note is a reply to Joshua Goodman's paper "Efficient Algorithms for Parsing the DOP Model " (Goodman, 1996). In his paper, Goodman makes a number of claims about (my work on) the Data-Oriented Parsing model (Bod, 1992-1996). This note shows that some of these claims must be mistaken. 1. Goodman did not find an efficient algorithm for "DOP" Goodman claims to have found an efficient polynomial algorithm for parsing the DOP model. However, Goodman's model generates best parse trees that can never be produced by the DOP model. In the introduction of his paper, Goodman observes that the DOP model ".. can be summarized as a special kind of Stochastic Tree Substitution Grammar (STSG): given a bracketed, labeled training corpus, let every subtree of that corpus be an elementary tree, with a probability proportional to the number of occurrences of that subtree in the training corpus. " Goodman then neglects to add that according to the DOP model, the "preferred " or "best " parse tree of a sentence is the most probable parse tree of that sentence. This definition is found in all my publications about DOP (e.g. Bod, 1992-96; van den Berg, Bod and Scha, 1994; Bod and Scha,

