Results 1 -
8 of
8
An empirical evaluation of models of text document similarity
- In CogSci2005
, 2005
"... Modeling the semantic similarity between text documents presents a significant theoretical challenge for cognitive science, with ready-made applications in information handling and decision support systems dealing with text. While a number of candidate models exist, they have generally not been asse ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Modeling the semantic similarity between text documents presents a significant theoretical challenge for cognitive science, with ready-made applications in information handling and decision support systems dealing with text. While a number of candidate models exist, they have generally not been assessed in terms of their ability to emulate human judgments of similarity. To address this problem, we conducted an experiment that collected repeated similarity measures for each pair of documents in a small corpus of short news documents. An analysis of human performance showed inter-rater correlations of about 0.6. We then considered the ability of existing models—using wordbased, n-gram and Latent Semantic Analysis (LSA) approaches—to model these human judgments. The best performed LSA model produced correlations of about 0.6, consistent with human performance, while the best performed word-based and n-gram models achieved correlations closer to 0.5. Many of the remaining models showed almost no correlation with human performance. Based on our results, we provide some discussion of the key strengths and weaknesses of the models we examined.
An improved method for deriving word meaning from lexical co-occurrence
- Cognitive Psychology
, 2004
"... The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the cooccurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund, 1997a). This paper introduces a new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments. We explain the new model, known as COALS, and how it relates to prior methods, and then evaluate the various models on a range of tasks, including a novel set of semantic similarity ratings involving both semantically and morphologically related terms.
An improved model of semantic similarity based on lexical co-occurence
- COMMUNICATIONS OF THE ACM
, 2006
"... The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the cooccurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund, 1997a). This paper introduces a new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments. We explain the new model, known as COALS, and how it relates to prior methods, and then evaluate the various models on a range of tasks, including a novel set of semantic similarity ratings involving both semantically and morphologically related terms.
Common and Distinctive Features in Stimulus Similarity: A Modified Version of the Contrast Model
, 2002
"... Featural representations of similarity data assume that people represent stimuli in terms of a set of discrete properties. We consider the differences in featu al representations that arise from making fo u di#erent assu;LK' ns abo u how similarity ismeasu)q' Three of these similarity models - ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Featural representations of similarity data assume that people represent stimuli in terms of a set of discrete properties. We consider the differences in featu al representations that arise from making fo u di#erent assu;LK' ns abo u how similarity ismeasu)q' Three of these similarity models --- the common featu2L model, the distinctive featu es model, and Tversky's seminal contrast model --- have been considered previoued . The other model is new, and modifies the contrast model byassu ming that each individu al featu re only ever acts as a common or distinctive feature. Each of the four models is tested on previou sly examined similarity data, relating to kinship terms, and on a new data set, relating to faces. In fitting the models, we use the Geometric Complexity Criterion to balance the competing demands of data-fit and model complexity. The resuq2 show that both common and distinctive features are important for stimuim representation, and we argue that the modified contrast model combines these two components in a more effective and interpretable way than Tversky's original formulation.
A Comparison of Machine Measures of Text Document Similarity with Human Judgments
"... A central problem in the information sciences involves measuring the semantic similarity between text documents. Although this is fundamentally a cognitive modeling problem, existing methods have not been assessed in terms of their ability to emulate human judgments of similarity. To address this ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A central problem in the information sciences involves measuring the semantic similarity between text documents. Although this is fundamentally a cognitive modeling problem, existing methods have not been assessed in terms of their ability to emulate human judgments of similarity. To address this problem, we conducted a controlled psychological experiment that collected repeated similarity measures for each pair of documents in a small corpus of short news documents. We then considered the ability of a variety of established methods using word-based, n-gram and Latent Semantic Analysis (LSA) approaches to model these human judgments. Our most important finding is that none of the methods we examined produced good correlations with the human judgments. The best performed LSA model produced correlations of about 0.6. The best performed word-based and n-gram models achieved correlations closer to 0.5. Many of the variations we considered showed almost no # Correspondence should be addressed to: Michael D. Lee, Department of Psychology, University of Adelaide, SA 5005, AUSTRALIA. Telephone: +61 8 8303 6096, Facsimile: +61 8 8303 3770, E-mail: michael.lee@adelaide.edu.au, URL: http://www.psychology.adelaide.edu.au/members/sta#/michaellee/homepage correlation with human performance. These findings suggest that developing better cognitive models of human text similarity judgments o#ers a promising avenue of research for the improvement of information handling systems that deal with text.
A Distance Model for Rhythms
"... Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce a model for rhythms based on the distributions of distances between subsequences. A specific imp ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce a model for rhythms based on the distributions of distances between subsequences. A specific implementation of the model when considering Hamming distances over a simple rhythm representation is described. The proposed model consistently outperforms a standard Hidden Markov Model in terms of conditional prediction accuracy on two different music databases. 1.
Predictive Models for Music
, 2008
"... submitted for publication Abstract. Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce generative models for melodies. We decompose melodic modelin ..."
Abstract
- Add to MetaCart
submitted for publication Abstract. Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce generative models for melodies. We decompose melodic modeling into two subtasks. We first propose a rhythm model based on the distributions of distances between subsequences. Then, we define a generative model for melodies given chords and rhythms based on modeling sequences of Narmour features. The rhythm model consistently outperforms a standard Hidden Markov Model in terms of conditional prediction accuracy on two different music databases. Using a similar evaluation procedure, the proposed melodic model consistently outperforms an Input/Output Hidden Markov Model. Furthermore, sampling these models given appropriate musical contexts generates realistic melodies. 2 IDIAP–RR 08-51 1
A Distance Model for Rhythms
, 2008
"... Abstract. Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce a model for rhythms based on the distributions of distances between subsequences. A sp ..."
Abstract
- Add to MetaCart
Abstract. Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce a model for rhythms based on the distributions of distances between subsequences. A specific implementation of the model when considering Hamming distances over a simple rhythm representation is described. The proposed model consistently outperforms a standard Hidden Markov Model in terms of conditional prediction accuracy on two different music databases. 2 IDIAP–RR 08-33 1

