Results 1  10
of
30
Discriminative Clustering by Regularized Information Maximization
"... Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive informationtheoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semisupervised learning. In particular, we instantiate the framework to unsupervised, multiclass kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
A LeastSquares Framework for Component Analysis
, 2009
"... ... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedfo ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedform. However, the eigenformulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigenproblems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified leastsquares framework to formulate many CA methods. We show how PCA, LDA, CCA, LE, SC, and their kernel and regularized extensions, correspond to a particular instance of leastsquares weighted kernel reduced rank regression (LSWKRRR). The LSWKRRR formulation of CA methods has several benefits: (1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; (2) yields efficient numerical schemes to solve CA techniques; (3) overcomes the small sample size problem; (4) provides a framework to easily extend CA methods. We derive new weighted generalizations of PCA, LDA, CCA and SC, and several novel CA techniques.
On InformationMaximization Clustering: Tuning Parameter Selection and Analytic Solution
"... Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve nonconvex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative informationmaximization clustering method based on a squaredloss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach. 1.
Clusterpath: an algorithm for clustering using convex fusion penalties
 In Proc. ICML
, 2011
"... We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relat ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally gives stateoftheart results similar to spectral clustering for nonconvex clusters, and has the added benefit of learning a tree structure from the data. 1.
SimpleNPKL: Simple NonParametric Kernel Learning
"... Previous studies of NonParametric Kernel (NPK) learning usually reduce to solving some SemiDefinite Programming (SDP) problem by a standard SDP solver. However, time complexity of standard interiorpoint SDP solvers could be as high as O(n 6.5). Such intensive computation cost prohibits NPK learni ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Previous studies of NonParametric Kernel (NPK) learning usually reduce to solving some SemiDefinite Programming (SDP) problem by a standard SDP solver. However, time complexity of standard interiorpoint SDP solvers could be as high as O(n 6.5). Such intensive computation cost prohibits NPK learning applicable to real applications, even for data sets of moderate size. In this paper, we propose an efficient approach to NPK learning from side information, referred to as SimpleNPKL, which can efficiently learn nonparametric kernels from large sets of pairwise constraints. In particular, we show that the proposed SimpleNPKL with linear loss has a closedform solution that can be simply computed by the Lanczos algorithm. Moreover, we show that the SimpleNPKL with square hinge loss can be reformulated as a saddlepoint optimization task, which can be further solved by a fast iterative algorithm. In contrast to the previous approaches, our empirical results show that our new technique achieves the same accuracy, but is significantly more efficient and scalable. 1.
TRACE OPTIMIZATION AND EIGENPROBLEMS IN DIMENSION REDUCTION METHODS
"... Abstract. This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input highdimensional data, the goal of dimension reduction is to map them to a lowdimensional space such that certain properties of the initial ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input highdimensional data, the goal of dimension reduction is to map them to a lowdimensional space such that certain properties of the initial data are preserved. Optimizing the above properties among the reduced data can be typically posed as a trace optimization problem that leads to an eigenvalue problem. There is a rich variety of such problems and the goal of this paper is to unravel relations between them as well as to discuss effective solution techniques. First, we make a distinction between projective methods that determine an explicit linear projection from the highdimensional space to the lowdimensional space, and nonlinear methods where the mapping between the two is nonlinear and implicit. Then, we show that all of the eigenvalue problems solved in the context of explicit projections can be viewed as the projected analogues of the socalled nonlinear or implicit projections. We also discuss kernels as a means of unifying both types of methods and revisit some of the equivalences between methods established in this way. Finally, we provide some illustrative examples to showcase the behavior and the particular characteristics of the various dimension reduction methods on real world data sets.
A Family of Simple NonParametric Kernel Learning Algorithms
"... Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver cou ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn nonparametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closedform solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be reformulated as a saddlepoint optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.
Maximum Margin Temporal Clustering
"... Temporal Clustering (TC) refers to the factorization of multiple time series into a set of nonoverlapping segments that belong to k temporal clusters. Existing methods based on extensions of generative models such as kmeans or Switching Linear Dynamical Systems (SLDS) often lead to intractable inf ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Temporal Clustering (TC) refers to the factorization of multiple time series into a set of nonoverlapping segments that belong to k temporal clusters. Existing methods based on extensions of generative models such as kmeans or Switching Linear Dynamical Systems (SLDS) often lead to intractable inference and lack a mechanism for feature selection, critical when dealing with high dimensional data. To overcome these limitations, this paper proposes Maximum Margin Temporal Clustering (MMTC). MMTC simultaneously determines the start and the end of each segment, while learning a multiclass Support Vector Machine (SVM) to discriminate among temporal clusters. MMTC extends Maximum Margin Clustering in two ways: first, it incorporates the notion of TC, and second, it introduces additional constraints to achieve better balance between clusters. Experiments on clustering human actions and bee dancing motions illustrate the benefits of our approach compared to stateoftheart methods. 1