Results 1 -
2 of
2
Corpus-based learning of analogies and semantic relations
- Machine Learning
, 2005
"... Abstract. We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; fo ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Abstract. We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47 % of a collection of 374 collegelevel analogy questions (random guessing would yield 20 % correct; the average college-bound senior high school student answers about 57 % correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as “laser printer”, according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearestneighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5 % (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2 % (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.
A Mutually Supervised Ensemble Approach for Clustering Heterogeneous Datasets
"... Abstract — We present an algorithm to address the problem of clustering two contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects. The method is based on clustering the datasets individually and then combining the resulting clusters. ..."
Abstract
- Add to MetaCart
Abstract — We present an algorithm to address the problem of clustering two contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects. The method is based on clustering the datasets individually and then combining the resulting clusters. The algorithm iteratively refines the two sets of clusters using a mutually supervised approach to maximize their mutual entropy and finally computes a single set of clusters. We applied our algorithm on a document collection using multiple feature sets that were extracted by natural language preprocessing methods. Empirical results demonstrate that our method outperforms clustering based on individual feature sets, clustering based on unified feature sets, and clustering based on a well-studied ensemble method.

