Results 1  10
of
18
Simultaneous feature selection and clustering using mixture models
 IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched u ..."
Abstract

Cited by 106 (1 self)
 Add to MetaCart
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectationmaximization (EM) algorithm to estimate it, in the context of mixturebased clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Variable Selection for ModelBased Clustering
 Journal of the American Statistical Association
, 2006
"... We consider the problem of variable or feature selection for modelbased clustering. We recast the problem of comparing two nested subsets of variables as a model comparison problem, and address it using approximate Bayes factors. We develop a greedy search algorithm for finding a local optimum in m ..."
Abstract

Cited by 79 (7 self)
 Add to MetaCart
We consider the problem of variable or feature selection for modelbased clustering. We recast the problem of comparing two nested subsets of variables as a model comparison problem, and address it using approximate Bayes factors. We develop a greedy search algorithm for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples, and found that removing irrelevant variables often improved performance. Compared to methods based on all the variables, our variable selection method consistently yielded more accurate estimates of the number of clusters, and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.
An InformationTheoretic External ClusterValidity Measure
 Research Report RJ 10219, IBM
, 2001
"... In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consi ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. When all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that w...
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
, 2002
"... This paper presents a probabilistic mixture modeling framework for the hierarchic organisation of document collections. It is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organisation of a document collection can be further exploited ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
This paper presents a probabilistic mixture modeling framework for the hierarchic organisation of document collections. It is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organisation of a document collection can be further exploited to create a kernel which boosts the performance of stateoftheart Support Vector Machine document classifiers. It is shown that the performance of such a classifier is further enhanced when employing the kernel derived from an appropriate hierarchic mixture model used for partitioning a document corpus rather than the kernel associated with a at nonhierarchic mixture model. This has important implications for document classification when a hierarchic ordering of topics exists. This can be considered as the eective combination of documents with no topic or class labels (unlabeled data), labeled documents, and prior domain knowledge (in the form of the known hierarchic structure), in providing enhanced document classification performance.
ModelBased Hierarchical Clustering
 In Proc. 16th Conf. Uncertainty in Artificial Intelligence
, 2000
"... We present an approach to modelbased hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex featureset partitioning that is a key component of our model. Features can have ei ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
We present an approach to modelbased hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex featureset partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a twostage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced...
Feature Selection in MixtureBased Clustering
, 2002
"... While there exist many approaches to clustering, the important issue of feature selection, that is, what attributes of the data are relevant, is rarely addressed. Feature selection for clustering is made difficult by the absence of class labels to guide the search. In this paper, we propose two appr ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
While there exist many approaches to clustering, the important issue of feature selection, that is, what attributes of the data are relevant, is rarely addressed. Feature selection for clustering is made difficult by the absence of class labels to guide the search. In this paper, we propose two approaches to deal with this problem. In the first one, instead of making hard selections, we estimate how salient each features is. An expectationmaximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami's mutualinformationbased feature relevance criterion to the unsupervised case. Implementation is carried out by a backward search scheme. The resulting algorithm can be classified as a "wrapper", since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance. 1
Unsupervised imageset clustering using an information theoretic framework
 IEEE transactions on image processing
, 2006
"... Abstract—In this paper, we combine discrete and continuous image models with information–theoreticbased criteria for unsupervised hierarchical imageset clustering. The continuous image modeling is based on mixture of Gaussian densities. The unsupervised imageset clustering is based on a generaliz ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we combine discrete and continuous image models with information–theoreticbased criteria for unsupervised hierarchical imageset clustering. The continuous image modeling is based on mixture of Gaussian densities. The unsupervised imageset clustering is based on a generalized version of a recently introduced information–theoretic principle, the information bottleneck principle. Images are clustered such that the mutual information between the clusters and the image content is maximally preserved. Experimental results demonstrate the performance of the proposed framework for image clustering on a large image set. Information theoretic tools are used to evaluate cluster quality. Particular emphasis is placed on the application of the clustering for efficient image search and retrieval. Index Terms—Hierarchical database analysis, image clustering, image database management, image modeling, information bottleneck (IB), Kullback–Leibler divergence, mixture of Gaussians, mutual information, retrieval.
A Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents
 Proceedings of the 15th International Conference on Pattern Recognition (ICPR’2000
, 2000
"... In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of largescale sparse highdimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of largescale sparse highdimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on the multinomial and binomial distributions are most appropriate. An Expectation Maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections. 1.
Clustering, Dimensionality Reduction and Side Information
, 2006
"... Recent advances in sensing and storage technology have created many highvolume, highdimensional data sets in pattern recognition, machine learning, and data mining. Unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no welldefined notion of ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Recent advances in sensing and storage technology have created many highvolume, highdimensional data sets in pattern recognition, machine learning, and data mining. Unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no welldefined notion of classes. The purpose of this thesis is to study some of the open problems in two main areas of unsupervised learning, namely clustering and (unsupervised) dimensionality reduction. Instancelevel constraint on objects, an example of sideinformation, is also considered to improve the clustering results. Our first contribution is a modification to the isometric feature mapping (ISOMAP) algorithm when the input data, instead of being all available simultaneously, arrive sequentially from a data stream. ISOMAP is representative of a class of nonlinear dimensionality reduction algorithms that are based on the notion of a manifold. Both the standard ISOMAP and the landmark version of ISOMAP are considered. Experimental results on synthetic data as well as real world images demonstrate that the modified algorithm can maintain an accurate lowdimensional representation of the data in an efficient manner. We study the problem of feature selection in modelbased clustering when the number of clusters
Feature Saliency in Unsupervised Learning
, 2002
"... Clustering is a common unsupervised learning technique to discover the structure of a set of multidimensional data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarel ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Clustering is a common unsupervised learning technique to discover the structure of a set of multidimensional data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon.