Results 1  10
of
557
On Bayesian analysis of mixtures with an unknown number of components
 INSTITUTE OF INTERNATIONAL ECONOMICS PROJECT ON INTERNATIONAL COMPETITION POLICY,&QUOT; COM/DAFFE/CLP/TD(94)42
, 1997
"... ..."
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 431 (29 self)
 Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 385 (21 self)
 Add to MetaCart
(Show Context)
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
How many clusters? Which clustering method? Answers via modelbased cluster analysis
 THE COMPUTER JOURNAL
, 1998
"... ..."
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 357 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
 Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract

Cited by 356 (11 self)
 Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
MLESAC: A New Robust Estimator with Application to Estimating Image Geometry
 Computer Vision and Image Understanding
, 2000
"... A new method is presented for robustly estimating multiple view relations from point correspondences. The method comprises two parts. The first is a new robust estimator MLESAC which is a generalization of the RANSAC estimator. It adopts the same sampling strategy as RANSAC to generate putative solu ..."
Abstract

Cited by 335 (10 self)
 Add to MetaCart
(Show Context)
A new method is presented for robustly estimating multiple view relations from point correspondences. The method comprises two parts. The first is a new robust estimator MLESAC which is a generalization of the RANSAC estimator. It adopts the same sampling strategy as RANSAC to generate putative solutions, but chooses the solution that maximizes the likelihood rather than just the number of inliers. The second part of the algorithm is a general purpose method for automatically parameterizing these relations, using the output of MLESAC. A difficulty with multiview image relations is that there are often nonlinear constraints between the parameters, making optimization a difficult task. The parameterization method overcomes the difficulty of nonlinear constraints and conducts a constrained optimization. The method is general and its use is illustrated for the estimation of fundamental matrices, image–image homographies, and quadratic transformations. Results are given for both synthetic and real images. It is demonstrated that the method gives results equal or superior to those of previous approaches. c ○ 2000 Academic Press 1.
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 294 (17 self)
 Add to MetaCart
(Show Context)
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 232 (1 self)
 Add to MetaCart
(Show Context)
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.