Results 1 - 10
of
30
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
Similarity of semantic relations
- Computational Linguistics
, 2006
"... There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47 % on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56 % on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM. 1.
Corpus-based learning of analogies and semantic relations
- Machine Learning
, 2005
"... Abstract. We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; fo ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Abstract. We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47 % of a collection of 374 collegelevel analogy questions (random guessing would yield 20 % correct; the average college-bound senior high school student answers about 57 % correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as “laser printer”, according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearestneighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5 % (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2 % (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.
Automatic interpretation of noun compounds using WordNet similarity
- In Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, South Korea, 11–13
, 2005
"... Abstract. The paper introduces a method for interpreting novel noun compounds with semantic relations. The method is built around word similarity with pretagged noun compounds, based onWordNet::Similarity. Over 1,088 training instances and 1,081 test instances from the Wall Street Journal in the Pen ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
Abstract. The paper introduces a method for interpreting novel noun compounds with semantic relations. The method is built around word similarity with pretagged noun compounds, based onWordNet::Similarity. Over 1,088 training instances and 1,081 test instances from the Wall Street Journal in the Penn Treebank, the proposed method was able to correctly classify 53.3 % of the test noun compounds. We also investigated the relative contribution of the modifier and the head noun in noun compounds of different semantic types. 1
The disambiguation of nominalizations
- Computational Linguistics
, 2002
"... This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectio ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectional constraints imposed by the nominalized compound head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, and (c) the fact that these constraints can be easily overridden by contextual or pragmatic factors. The interpretation of nominalizations poses a further challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. Even an approximation that maps the compound head to its underlying verb provides insufficient evidence. We present an approach that treats the interpretation task as a disambiguation problem and show how we can “re-create” the missing distributional evidence by exploiting partial parsing, smoothing techniques, and contextual information. We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1 % (over a baseline of 61.5%) on the British National Corpus. 1.
Learning noun-modifier semantic relations with corpus-based and wordnet-based features
- In Proceedings of the TwentyFirst National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference
, 2006
"... Département d’informatique et de recherche opérationnelle ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Département d’informatique et de recherche opérationnelle
Interpreting semantic relations in noun compounds via verb semantics. COLING-ACL
, 2006
"... We propose a novel method for automatically interpreting compound nouns based on a predefined set of semantic relations. First we map verb tokens in sentential contexts to a fixed set of seed verbs using WordNet::Similarity and Moby’s Thesaurus. We then match the sentences with semantic relations ba ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We propose a novel method for automatically interpreting compound nouns based on a predefined set of semantic relations. First we map verb tokens in sentential contexts to a fixed set of seed verbs using WordNet::Similarity and Moby’s Thesaurus. We then match the sentences with semantic relations based on the semantics of the seed verbs and grammatical roles of the head noun and modifier. Based on the semantics of the matched sentences, we then build a classifier using TiMBL. The performance of our final system at interpreting NCs is 52.6%. 1
Using verbs to characterize noun-noun relations
- In Proc. of the 12th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA), Bularia
, 2006
"... Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover predicates that make explicit the hidden relations between the nouns. This is accomplished by writing Web search engine que ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover predicates that make explicit the hidden relations between the nouns. This is accomplished by writing Web search engine queries that restate the noun compound as a relative clause containing a wildcard character to be filled in with a verb. A comparison to results from the literature suggest this is a promising approach.
The Disambiguation of Nominalisations
- COMPUTATIONAL LINGUISTICS
, 2002
"... This paper addresses the interpretation of nominalisations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalisations needs to take into account: (a) the selectiona ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper addresses the interpretation of nominalisations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalisations needs to take into account: (a) the selectional constraints imposed by the nominalised compound head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, and (c) the fact that these constraints can be easily overridden by contextual or pragmatic factors. The interpretation of nominalisations poses a further challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. Even an approximation which maps the compound head to its underlying verb provides insufficient evidence. We present an approach which treats the interpretation task as a disambiguation problem and show how we can "recreate" the missing distributional evidence by exploiting partial parsing, smoothing techniques, and contextual information. We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1% (over a baseline of 61.5%) on the British National Corpus
The Knowledge Required to Interpret Noun Compounds
"... Noun compound interpretation is the task of determining the semantic relations among the constituents of a noun compound. For example, “concrete floor” means a floor made of concrete, while “gymnasium floor” is the floor region of a gymnasium. We would like to enable knowledge acquisition systems to ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Noun compound interpretation is the task of determining the semantic relations among the constituents of a noun compound. For example, “concrete floor” means a floor made of concrete, while “gymnasium floor” is the floor region of a gymnasium. We would like to enable knowledge acquisition systems to interpret noun compounds, as part of their overall task of translating imprecise and incomplete information into formal representations that support automated reasoning. However, if interpreting noun compounds requires detailed knowledge of the constituent nouns, then it may not be worth doing: the cost of acquiring this knowledge may outweigh the potential benefit. This paper describes an empirical investigation of the knowledge required to interpret noun compounds. It concludes that the axioms and ontological distinctions important for this task are derived from the top levels of a hierarchical knowledge base (KB); detailed knowledge of specific nouns is less important. This is good news, not only for our work on knowledge acquisition systems, but also for research on text understanding, where noun compound interpretation has a long history.

