Results 1 -
4 of
4
A DOP Model for Semantic Interpretation
- Proceedings ACL/EACL-97
, 1997
"... In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, usi ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annota- tions such as the Penn Tree-bank. If a cor- pus with semantically annotated sentences is used, the same approach can also gen- erate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic inter- pretation algorithm was tested on two semantically annotated corpora: the English ATIS corpus and the Dutch OVIS corpus.
Data-Oriented Language Processing -- An Overview
- CORPUSBASED METHODS IN LANGUAGE AND SPEECH PROCESSING
, 1997
"... Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of pre ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data-oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase-structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data-oriented processing approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.
Two Questions about Data-Oriented Parsing
- IN PROCEEDINGS FOURTH WORKSHOP ON VERY LARGE CORPORA
, 1996
"... In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results on unedited data are worse than on cleaned-up data, although still very competitive if compared to other models. As to the parsing of word strings, we show that the hardness of the problem does not so much depend on unknown words, but on previously unseen lexical categories of known words. We give a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method is of general interest since it shows that good performance can be obtained without the use of a part-of- speech tagger. To the best of our knowledge, our method outperforms other statistical parsers tested on Penn Treebank word strings.
Textual Similarities based on a Distributional Approach
- in Proceedings of the Tenth International Workshop on Database and Expert Systems Applications (DEXA99
, 1999
"... The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities fo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities for measuring the adequacy between a query and documents. The objective of this paper is to present and compare several textual similarity measures in the framework of the Distributional Semantics (DS) model for IR. This model is an extension of the standard Vector Space model, which further takes the co-frequencies between the terms in a given reference corpus into account. These co-frequencies are considered to provide a distributional representation of the "semantics" of the terms. The co-occurrence profiles are used to represent the documents as vectors. Practical retrieval experiments using DS-based similarity models have been conducted in the framework of the AMARYLLIS evaluation campaig...

