Results 1  10
of
153
Clustering by compression
 IEEE Transactions on Information Theory
, 2005
"... Abstract—We present a new method for clustering based on compression. The method does not use subjectspecific features or background knowledge, and works as follows: First, we determine a parameterfree, universal, similarity distance, the normalized compression distance or NCD, computed from the l ..."
Abstract

Cited by 179 (23 self)
 Add to MetaCart
Abstract—We present a new method for clustering based on compression. The method does not use subjectspecific features or background knowledge, and works as follows: First, we determine a parameterfree, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, codeveloped by one of the authors, is provably optimal. However, the optimality comes at the price of using the noncomputable notion of Kolmogorovcomplexity. We propose axioms to capture the realworld setting, and show that the NCD approximates optimality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (ternary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics, we presented new evidence for major questions in Mammalian evolution, based on wholemitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis. Index Terms—Heterogenous data analysis, hierarchical unsupervised clustering, Kolmogorovcomplexity, normalized compression distance, parameterfree data mining, quartet tree method, universal dissimilarity distance. I.
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 159 (1 self)
 Add to MetaCart
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.
Curvilinear Component Analysis: A SelfOrganizing Neural Network for Nonlinear Mapping of Data Sets
, 1997
"... We present a new strategy called “curvilinear component analysis” (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a selforganized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input spac ..."
Abstract

Cited by 152 (1 self)
 Add to MetaCart
We present a new strategy called “curvilinear component analysis” (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a selforganized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input space) and nonlinear projection (P) of these quantizing vectors toward an output space, providing a revealing unfolding of the submanifold. After learning, the network has the ability to continuously map any new point from one space into another: forward mapping of new points in the input space, or backward mapping of an arbitrary position in the output space.
Rules and Exemplars in Category Learning
 Journal of Experimental Psychology: General
, 1998
"... haracterized by descriptions of each module and how each serves in those tasks for which it is best suited. However, these theories often do not emphasize how modules interact in producing responses and in learning. In this article we will develop a modular theory of categorization that follows fro ..."
Abstract

Cited by 144 (10 self)
 Add to MetaCart
haracterized by descriptions of each module and how each serves in those tasks for which it is best suited. However, these theories often do not emphasize how modules interact in producing responses and in learning. In this article we will develop a modular theory of categorization that follows from two distinct accounts of this behavior. The first account is that of rulebased theories of categorization. These theories emerge from a philosophical tradition in which concepts and categorization are described in terms of definitional rules. For example, if a living thing has a wide, flat tail and constructs dams by cutting down trees with its This work was supported by Indiana University Cognitive Science Program Fellowships and by NIMH ResearchTraining Grant PHST32MH1987903 to Erickson, and in part by NIMH FIRST Award 1R29MH5157201 to Kruschke. This research was reported as a poster at the 1996 Cognitive Science Society Conference in San Diego, CA. We than
Toward a unified theory of similarity and recognition
 Psychological Review
, 1988
"... A new theory of similarity, rooted in the detection and recognition literatures, is developed. The general recognition theory assumes that the perceptual effect of a stimulus is random but that on any single trial it can be represented as a point in a multidimensional space. Similarity is a function ..."
Abstract

Cited by 80 (6 self)
 Add to MetaCart
A new theory of similarity, rooted in the detection and recognition literatures, is developed. The general recognition theory assumes that the perceptual effect of a stimulus is random but that on any single trial it can be represented as a point in a multidimensional space. Similarity is a function of the overlap of perceptual distributions. It is shown that the general recognition theory contains Euclidean distance models of similarity as a special case but that unlike them, it is not constrained by any distance axioms. Three experiments are reported that test the empirical validity of the theory. In these experiments the general recognition theory accounts for similarity data as well as the currently popular similarity theories do, and it accounts for identification data as well as the longstanding "champion " identification model does. The concept of similarity is of fundamental importance in psychology. Not only is there a vast literature concerned directly with the interpretation of subjective similarity judgments (e.g., as in multidimensional scaling) but the concept also plays a crucial but less direct role in the modeling of many psychophysical tasks. This is particularly true in the case of pattern and form recognition. It is frequently assumed that the greater the similarity between a pair of stimuli, the more likely one will be confused with the other in a recognition task (e.g., Luce, 1963; Shepard, 1964; Tversky & Gati, 1982). Yet despite the potentially close relationship between the two, there have been only a few attempts at developing theories that unify the similarity and recognition literatures. Most attempts to link the two have used a distancebased similarity measure to predict the confusions in recognition ex
Regression Modeling in BackPropagation and Projection Pursuit Learning
, 1994
"... We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interconnection weights. However, unlike the use of fixed nonlinear activations (usually sigmoidal) for the hidden neurons in BPL, the PPL systematically approximates the unknown nonlinear activations. Moreover, the BPL estimates all the weights simultaneously at each iteration, while the PPL estimates the weights cyclically (neuronbyneuron and layerbylayer) at each iteration. Although the BPL and the PPL have comparable training speed when based on a GaussNewton optimization algorithm, the PPL proves more parsimonious in that the PPL requires a fewer hi...
ON THE DANGERS OF AVERAGING ACROSS SUBJECTS WHEN USING MULTIDIMENSIONAL SCALING OR THE SIMILARITYCHOICE MODEL
 PSYCHOLOGICAL SCIENCE
, 1994
"... When ratings of judged similarity or frequencies of stimulus identification are averaged across subjects, the psychological structure ofthe data is fundamentally changed. Regardless of the structure of the individualsubject data, the averaged similarity data will likely be well fit by a standard mu ..."
Abstract

Cited by 63 (31 self)
 Add to MetaCart
When ratings of judged similarity or frequencies of stimulus identification are averaged across subjects, the psychological structure ofthe data is fundamentally changed. Regardless of the structure of the individualsubject data, the averaged similarity data will likely be well fit by a standard multidimensional scaling model, and the averaged identification data will likely be well fit by the similaritychoice model. In fact, both models often provide excellent fits to averaged data, even if they fail to fit the data of each individual subject. Thus, a good fit of either model to averaged data cannot be taken as evidence that the model describes the psychological structure that characterizes individual subjects. We hypothesize that these effects are due to the increased symmetry that is a mathematical consequence of the averaging operation. It is common practice to average across subjects when analyzing
On the danger of averaging across observers when comparing decision bound and generalized context models of categorization
 Perception & Psychophysics
, 1999
"... Averaging across observers is common in psychological research. Often averaging reduces the measurement error, and thus does not affect the inference drawn about the behavior of individuals. However, in other situations, averaging alters the structure of the data qualitatively, leading to an incorre ..."
Abstract

Cited by 59 (40 self)
 Add to MetaCart
Averaging across observers is common in psychological research. Often averaging reduces the measurement error, and thus does not affect the inference drawn about the behavior of individuals. However, in other situations, averaging alters the structure of the data qualitatively, leading to an incorrect inference about the behavior of individuals. This research investigated the influence of averaging across observers on the fits of decision bound models (F.G. Ashby, 1992a) and generalized context models (GCM; R.M. Nosofsky, 1986) through Monte Carlo simulation of a variety of categorization conditions, perceptual representations, and individual difference assumptions, and in an experiment. The results suggest that (a) averaging has little effect when the GCM is the correct model, (b) averaging often improves the fit of the GCM and worsens the fit of the decision bound model when the decision bound model is the correct model, (c) the GCM is quite flexible, and under many conditions can mimic the predictions of the decision bound model; the decision bound model, on the other hand, is generally unable to mimic the predictions of the GCM, (d) the validity of the decision bound model’s perceptual representation assumption can have a large effect on the inference drawn about the form of the decision bound, and (e) the experiment supported the claim that averaging improves the fit of the GCM. These results underscore the importance of performing single observer analysis if one is interested in understanding the categorization performance of individuals. The ability to categorize quickly and accurately is fundamental to survival. Everyday, we make hundreds of categorization judgments. Several detailed theories and quantitative models have been proposed to account for the perceptual and cognitive processes involved in categorization; the goal being to understand the categorization performance of individual behaving organisms.
XGvis: Interactive Data Visualization with Multidimensional Scaling
, 2001
"... this article. Section 2 gives an overview of how a user operates the XGvis system. Section 3 deals with algorithm animation, direct manipulation and perturbation of the con guration. Section 4 gives details about the cost functions and their interactively controlled parameters for transformation, s ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
this article. Section 2 gives an overview of how a user operates the XGvis system. Section 3 deals with algorithm animation, direct manipulation and perturbation of the con guration. Section 4 gives details about the cost functions and their interactively controlled parameters for transformation, subsetting and weighting of dissimilarities. Section 5 describes diagnostics for MDS. Section 6 is about computational and systems aspects, including coordination of windows, algorithms, and large data problems. Finally, Section 7 gives a tour of applications with examples of proximity analysis, dimension reduction, and graph layout in two and more dimensions