Results 1 -
8 of
8
A DOP Model for Semantic Interpretation
- Proceedings ACL/EACL-97
, 1997
"... In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, usi ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annota- tions such as the Penn Tree-bank. If a cor- pus with semantically annotated sentences is used, the same approach can also gen- erate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic inter- pretation algorithm was tested on two semantically annotated corpora: the English ATIS corpus and the Dutch OVIS corpus.
Data-Oriented Language Processing -- An Overview
- CORPUSBASED METHODS IN LANGUAGE AND SPEECH PROCESSING
, 1997
"... Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of pre ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data-oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase-structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data-oriented processing approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.
Context-Sensitive Spoken Dialogue Processing with the DOP Model
- Natural Language Engineering
, 1999
"... We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates ov ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into contextdependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
Do All Fragments Count?
- Natural Language Engineering
, 2003
"... We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over p ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank. We isolate a number of dependency relations which previous models neglect but which contribute to higher accuracy. We show that the history of statistical parsing models displays a tendency towards using more and larger fragments from training data.
A Data-Oriented Parsing Model for Lexical-Functional Grammar
- In Data-Oriented Parsing, ed. by Rens Bod, Remko Scha, & Khalil Sima’an
, 2003
"... Data-Oriented Parsing (DOP) models of natural language propose that human language processing works with representations of concrete past language experiences rather than with abstract linguistic rules. These models operate by decomposing the given representations into fragments and recomposing t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Data-Oriented Parsing (DOP) models of natural language propose that human language processing works with representations of concrete past language experiences rather than with abstract linguistic rules. These models operate by decomposing the given representations into fragments and recomposing those pieces to analyze new utterances. A probability model is used to select from all possible analyses of an utterance the most likely one. Previous DOP models were based on simple tree representations that neglect grammatical functions and syntactic features (Tree-DOP). In this paper, we present a new DOP model based on the more articulated representations of Lexical-Functional Grammar theory (LFG-DOP). LFG-DOP triggers a new, corpus-based notion of grammaticality, and an interestingly different class of probability models. An empirical evaluation of the model shows that larger as well as richer fragments improve performance. Finally, we go into some of the conceptual implications of our approach. 1
Towards a general model of applying science
- INTL. STUDIES IN THE PHILOSOPHY OF SCIENCE
, 2006
"... How is scientific knowledge used, adapted, and extended in deriving phenomena and realworld systems? This paper aims at developing a general account of ‘applying science’ within the exemplar-based framework of Data-Oriented Processing (DOP), which is also known as Exemplar-Based Explanation (EBE). ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
How is scientific knowledge used, adapted, and extended in deriving phenomena and realworld systems? This paper aims at developing a general account of ‘applying science’ within the exemplar-based framework of Data-Oriented Processing (DOP), which is also known as Exemplar-Based Explanation (EBE). According to the exemplar-based paradigm, phenomena are explained not by deriving them all the way down from theoretical laws and boundary conditions but by modelling them on previously derived phenomena that function as exemplars. To accomplish this, DOP proposes to maintain a corpus of derivation trees of previous phenomena together with a matching algorithm that combines subtrees from the corpus to derive new phenomena. By using a notion of derivational similarity, a new phenomenon can be modelled as closely as possible on previously explained phenomena. I will propose an instantiation of DOP which integrates theoretical and phenomenological modelling and which generalises over various disciplines, from fluid mechanics to language technology. I argue that DOP provides a solution for what I call Kuhn’s problem and that it redresses Kitcher’s account of explanation.
Towards Unifying Perception and Cognition: The Ubiquity of Trees. Prepublication
, 2005
"... Is there a single mechanism that underlies all perceptual and cognitive processing? This paper aims to solve a small part of Newell's challenge (A. Newell 1990, Unified Theories of Cognition, Harvard University Press) and proposes a model that unifies three different modalities: language, music and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Is there a single mechanism that underlies all perceptual and cognitive processing? This paper aims to solve a small part of Newell's challenge (A. Newell 1990, Unified Theories of Cognition, Harvard University Press) and proposes a model that unifies three different modalities: language, music and problem-solving. In doing so, we will focus on tree structures. Trees are ubiquitous in modeling high-level perception and cognition and have been used to represent grouping structures in linguistic, musical and visual perception and deductive structures in reasoning, learning and problem solving. We will show that an instantiation of the Data-Oriented Parsing (DOP) framework can accurately predict the correct tree structure for linguistic utterances, musical pieces and physics problems. The key idea of the DOP framework is that new input is analyzed by combining subtrees from a representative corpus of previous trees. While the labeling of the trees and the details of the combination operation may differ across the modalities, we argue that there is one model for predicting the tree that humans come up with. We report on experiments with manually annotated corpora for the three modalities, showing that the best performing model is the one which takes into account subtrees of arbitrary size and which selects the most probable tree from among the shortest derivations of an input.
From exemplar to grammar: Integrating analogy and probability in language learning
, 2008
"... We present a new model of language learning which is based on the following idea: if a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the ‘best’ tree f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new model of language learning which is based on the following idea: if a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the ‘best’ tree for each sentence. The best tree is obtained by maximizing ‘structural analogy ’ between a sentence and previous sentences, which is formalized by the most probable shortest combination of subtrees from all trees of previous sentences. Corpus-based experiments with this model on the Penn Treebank and the Childes database indicate that it can learn both exemplar-based and rulebased aspects of language, ranging from phrasal verbs to auxiliary fronting. By having learned the syntactic structures of sentences, we have also learned the grammar implicit in these structures, which can in turn be used to produce new sentences. We show that our model mimicks children’s language development from item-based constructions to abstract constructions, and that the model can simulate some of the errors made by children in producing complex questions. 1 1

