Results 11 - 20
of
174
Selectional Preference and Sense Disambiguation
, 1997
"... The absence of training data is a real problem for corpus-based approaches to sense disambiguation, one that is unlikely to be solved soon. Selectional preference is traditionally connected with sense ambiguity; this paper explores how a statistical model of selectionai preference, requiring neither ..."
Abstract
-
Cited by 96 (4 self)
- Add to MetaCart
The absence of training data is a real problem for corpus-based approaches to sense disambiguation, one that is unlikely to be solved soon. Selectional preference is traditionally connected with sense ambiguity; this paper explores how a statistical model of selectionai preference, requiring neither manual annotation of selection restrictions nor supervised training, can be used in sense disambiguation.
Generalizing Case Frames Using a Thesaurus and the MDL Principle
- Computational Linguistics
, 1998
"... this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue ..."
Abstract
-
Cited by 95 (4 self)
- Add to MetaCart
this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation
, 1998
"... Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense inventories. Subsequent discussion of those proposals resulted in senseval, the first evaluation exercise for word sense disambiguation (Kilgarriff and Palmer forthcoming). This article is a revised and extended version of our 1997 workshop paper, reviewing its observations and proposals and discussing them in light of the senseval exercise. It also includes a new in-depth empirical study of translingually-based sense inventories and distance measures, using statistics collected from native-speaker annotations of 222 polysemous contexts across 12 languages. These data show that monolingual sense distinctions at most levels of granularity can be effectively captured by translations into some ...
Selectional constraints: an information-theoretic model and its computational realization
, 1996
"... ..."
Inducing a Semantically Annotated Lexicon via EM-Based Clustering
, 1999
"... We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization fram ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries.
Class-Based Probability Estimation Using a Semantic Hierarchy
- COMPUTATIONAL LINGUISTICS
, 2003
"... This article concerns the estimation of a particular kind of probability, namely, the probability of a noun sense appearing as a particular argument of a predicate. In order to overcome the accompanying sparse-data problem, the proposal here is to define the probabilities in terms of senses from a s ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
This article concerns the estimation of a particular kind of probability, namely, the probability of a noun sense appearing as a particular argument of a predicate. In order to overcome the accompanying sparse-data problem, the proposal here is to define the probabilities in terms of senses from a semantic hierarchy and exploit the fact that the senses can be grouped into classes consisting of semantically similar senses. There is a particular focus on the problem of how to determine a suitable class for a given sense, or, alternatively, how to determine a suitable level of generalization in the hierarchy. A procedure is developed that uses a chi-square test to determine a suitable level of generalization. In order to test the performance of the estimation method, a pseudo-disambiguation task is used, together with two alternative estimation methods. Each method uses a different generalization procedure; the first alternative uses the minimum description length principle, and the second uses Resnik's measure of selectional preference. In addition, the performance of our method is investigated using both the standard Pearson chisquare statistic and the log-likelihood chi-square statistic
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Semi-Automatic Engineering of Ontologies from Text
- In Proceedings of the 12th Internal Conference on Software and Knowledge Engineering
, 2000
"... Ontologies have become an important means for structuring information and information systems and, hence, important in knowledge as well as in software engineering. However, there remains the problem of engineering large and adequate ontologies within short time frames in order to keep costs low. Fo ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
Ontologies have become an important means for structuring information and information systems and, hence, important in knowledge as well as in software engineering. However, there remains the problem of engineering large and adequate ontologies within short time frames in order to keep costs low. For this purpose, eorts have been made to facilitate the ontology engineering process, in particular the acquisition of ontologies from domain texts. We present a general architecture for discovering conceptual structures and engineering ontologies. Based on the architecture we propose a new approach to extend current approaches, who mostly focus on the semi-automatic acquisition of taxonomies, by the discovery of non-taxonomic conceptual relations. We use a generalized association rule algorithm that does not only detect relations between concepts, but also determines the appropriate level of abstraction at which to dene relations. 1 Introduction Ontologies 1 have shown their usefulness...

