Results 1  10
of
26
On InformationMaximization Clustering: Tuning Parameter Selection and Analytic Solution
"... Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve nonconvex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative informationmaximization clustering method based on a squaredloss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach. 1.
InformationTheoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
"... We study the problem of unsupervised domain adaptation, which aims to adapt classifiers trained on a labeled source domain to an unlabeled target domain. Many existing approaches first learn domaininvariant features and then construct classifiers with them. We propose a novel approach that jointly ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of unsupervised domain adaptation, which aims to adapt classifiers trained on a labeled source domain to an unlabeled target domain. Many existing approaches first learn domaininvariant features and then construct classifiers with them. We propose a novel approach that jointly learn the both. Specifically, while the method identifies a feature space where data in the source and the target domains are similarly distributed, it also learns the feature space discriminatively, optimizing an informationtheoretic metric as an proxy to the expected misclassification error on the target domain. We show how this optimization can be effectively carried out with simple gradientbased methods and how hyperparameters can be crossvalidated without demanding any labeled data from the target domain. Empirical studies on benchmark tasks of object recognition and sentiment analysis validated our modeling assumptions and demonstrated significant improvement of our method over competing ones in classification accuracies. 1.
Scalable Training of Mixture Models via Coresets
"... How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset will also provide a good fit for th ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset will also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set. More precisely, we prove that a weighted set of O(dk 3 /ε 2) data points suffices for computing a (1 + ε)approximation for the optimal model on the original n data points. Moreover, such coresets can be efficiently constructed in a mapreduce style computation, as well as in a streaming setting. Our results rely on a novel reduction of statistical estimation to problems in computational geometry, as well as new complexity results about mixtures of Gaussians. We empirically evaluate our algorithms on several real data sets, including a density estimation problem in the context of earthquake detection using accelerometers in mobile phones.
InformationMaximization Clustering based on SquaredLoss Mutual Information
"... Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially simpler than discrete optimization of cluster assignments. However, existing methods still involve nonconvex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative informationmaximization clustering method based on a squaredloss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.
Ensemble partitioning for unsupervised image categorization
 In ECCV
, 2012
"... Abstract. While the quality of object recognition systems can strongly benefit from more data, human annotation and labeling can hardly keep pace. This motivates the usage of autonomous and unsupervised learning methods. In this paper, we present a simple, yet effective method for unsupervised imag ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. While the quality of object recognition systems can strongly benefit from more data, human annotation and labeling can hardly keep pace. This motivates the usage of autonomous and unsupervised learning methods. In this paper, we present a simple, yet effective method for unsupervised image categorization, which relies on discriminative learners. Since automatically obtaining errorfree labeled training data for the learners is infeasible, we propose the concept of weak training (WT) set. WT sets have various deficiencies, but still carry useful information. Training on a single WT set cannot result in good performance, thus we design a random walk sampling scheme to create a series of diverse WT sets. This naturally allows our categorization learning to leverage ensemble learning techniques. In particular, for each WT set, we train a maxmargin classifier to further partition the whole dataset to be categorized. By doing so, each WT set leads to a base partitioning of the dataset and all the base partitionings are combined into an ensemble proximity matrix. The final categorization is completed by feeding this proximity matrix into a spectral clustering algorithm. Experiments on a variety of challenging datasets show that our method outperforms competing methods by a considerable margin. 1
Information Theoretic Clustering using Minimum Spanning Trees
"... Abstract. In this work we propose a new informationtheoretic clustering algorithm that infers cluster memberships by direct optimization of a nonparametric mutual information estimate between data distribution and cluster assignment. Although the optimization objective has a solid theoretical foun ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this work we propose a new informationtheoretic clustering algorithm that infers cluster memberships by direct optimization of a nonparametric mutual information estimate between data distribution and cluster assignment. Although the optimization objective has a solid theoretical foundation it is hard to optimize. We propose an approximate optimization formulation that leads to an efficient algorithm with low runtime complexity. The algorithm has a single free parameter, the number of clusters to find. We demonstrate superior performance on several synthetic and real datasets. 1
Informationtheoretic Semisupervised Metric Learning via Entropy Regularization
"... We propose a general informationtheoretic approach called SERAPH (SEmisupervised metRic leArning Paradigm with Hypersparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probabi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We propose a general informationtheoretic approach called SERAPH (SEmisupervised metRic leArning Paradigm with Hypersparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probability on labeled data and minimize it on unlabeled data following entropy regularization, which allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Furthermore, SERAPH is regularized by encouraging a lowrank projection induced from the metric. The optimization of SERAPH is solved efficiently and stably by an EMlike scheme with the analytical EStep and convex MStep. Experiments demonstrate that SERAPH compares favorably with many wellknown global and local metric learning methods. 1.
Squaredloss Mutual Information Regularization: A Novel Informationtheoretic Approach to Semisupervised Learning
"... We propose squaredloss mutual information regularization (SMIR) for multiclass probabilistic classification, following the information maximization principle. SMIR is convex under mild conditions and thus improves the nonconvexity of mutual information regularization. It offers all of the followin ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We propose squaredloss mutual information regularization (SMIR) for multiclass probabilistic classification, following the information maximization principle. SMIR is convex under mild conditions and thus improves the nonconvexity of mutual information regularization. It offers all of the following four abilities to semisupervised algorithms: Analytical solution, outofsample/multiclass classification, and probabilistic output. Furthermore, novel generalization error bounds are derived. Experiments show SMIR compares favorably with stateoftheart methods. 1.
Information Theoretical Clustering via Semidefinite Programming
"... We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique of semidefinite programming (SDP) relaxation to obtain a convex SDP problem. We show how the solution of the SDP problem can be further improved with a lowrank refinement heuristic. The lowrank solution reveals more clearly the cluster structure of the data. Empirical studies on several datasets demonstrate the effectiveness of our approach. In particular, the approach outperforms several other clustering algorithms when compared on standard evaluation metrics. 1
Pairwise Exemplar Clustering
"... Exemplarbased clustering methods have been extensively shown to be effective in many clustering problems. They adaptively determine the number of clusters and hold the appealing advantage of not requiring the estimation of latent parameters, which is otherwise difficult in case of complicated par ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Exemplarbased clustering methods have been extensively shown to be effective in many clustering problems. They adaptively determine the number of clusters and hold the appealing advantage of not requiring the estimation of latent parameters, which is otherwise difficult in case of complicated parametric model and high dimensionality of the data. However, modeling arbitrary underlying distribution of the data is still difficult for existing exemplarbased clustering methods. We present Pairwise Exemplar Clustering (PEC) to alleviate this problem by modeling the underlying cluster distributions more accurately with nonparametric kernel density estimation. Interpreting the clusters as classes from a supervised learning perspective, we search for an optimal partition of the data that balances two quantities: 1 the misclassification rate of the data partition for separating the clusters; 2 the sum of withincluster dissimilarities for controlling the cluster size. The broadly used kernel form of cut turns out to be a special case of our formulation. Moreover, we optimize the corresponding objective function by a new efficient algorithm for message computation in a pairwise MRF. Experimental results on synthetic and real data demonstrate the effectiveness of our method.