Results 1  10
of
52
Discriminative clustering for image cosegmentation
 In IEEE CVPR
, 2010
"... 45 rue d’Ulm ..."
(Show Context)
Discriminative Clustering by Regularized Information Maximization
"... Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive informationtheoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semisupervised learning. In particular, we instantiate the framework to unsupervised, multiclass kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
A LeastSquares Framework for Component Analysis
, 2009
"... ... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedfo ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedform. However, the eigenformulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigenproblems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified leastsquares framework to formulate many CA methods. We show how PCA, LDA, CCA, LE, SC, and their kernel and regularized extensions, correspond to a particular instance of leastsquares weighted kernel reduced rank regression (LSWKRRR). The LSWKRRR formulation of CA methods has several benefits: (1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; (2) yields efficient numerical schemes to solve CA techniques; (3) overcomes the small sample size problem; (4) provides a framework to easily extend CA methods. We derive new weighted generalizations of PCA, LDA, CCA and SC, and several novel CA techniques.
TRACE OPTIMIZATION AND EIGENPROBLEMS IN DIMENSION REDUCTION METHODS
"... Abstract. This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input highdimensional data, the goal of dimension reduction is to map them to a lowdimensional space such that certain properties of the initial ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper gives an overview of the eigenvalue problems encountered in areas of data mining that are related to dimension reduction. Given some input highdimensional data, the goal of dimension reduction is to map them to a lowdimensional space such that certain properties of the initial data are preserved. Optimizing the above properties among the reduced data can be typically posed as a trace optimization problem that leads to an eigenvalue problem. There is a rich variety of such problems and the goal of this paper is to unravel relations between them as well as to discuss effective solution techniques. First, we make a distinction between projective methods that determine an explicit linear projection from the highdimensional space to the lowdimensional space, and nonlinear methods where the mapping between the two is nonlinear and implicit. Then, we show that all of the eigenvalue problems solved in the context of explicit projections can be viewed as the projected analogues of the socalled nonlinear or implicit projections. We also discuss kernels as a means of unifying both types of methods and revisit some of the equivalences between methods established in this way. Finally, we provide some illustrative examples to showcase the behavior and the particular characteristics of the various dimension reduction methods on real world data sets.
Clusterpath: an algorithm for clustering using convex fusion penalties
 In Proc. ICML
, 2011
"... We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relat ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally gives stateoftheart results similar to spectral clustering for nonconvex clusters, and has the added benefit of learning a tree structure from the data. 1.
Finding Actors and Actions in Movies
"... This is a preliminary version accepted for publication at ICCV 2013. We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminativ ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
This is a preliminary version accepted for publication at ICCV 2013. We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weaklysupervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in the feature length movie Casablanca. 1.
On InformationMaximization Clustering: Tuning Parameter Selection and Analytic Solution
"... Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
Informationmaximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve nonconvex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative informationmaximization clustering method based on a squaredloss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach. 1.
Efficient Image and Video Colocalization with FrankWolfe Algorithm
 In ECCV
"... Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart m ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart methods, we show how we are able to naturally incorporate temporal terms and constraints for video colocalization into a quadratic programming framework. Furthermore, by leveraging the FrankWolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTubeObjects dataset for videos, as well as a joint combination of the two. 1
A.: Discriminative subcategorization
, 2013
"... The objective of this work is to learn subcategories. Rather than casting this as a problem of unsupervised clustering, we investigate a weakly supervised approach using both positive and negative samples of the category. We make the following contributions: (i) we introduce a new model for discri ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The objective of this work is to learn subcategories. Rather than casting this as a problem of unsupervised clustering, we investigate a weakly supervised approach using both positive and negative samples of the category. We make the following contributions: (i) we introduce a new model for discriminative subcategorization which determines cluster membership for positive samples whilst simultaneously learning a maxmargin classifier to separate each cluster from the negative samples; (ii) we show that this model does not suffer from the degenerate cluster problem that afflicts several competing methods (e.g., Latent SVM and MaxMargin Clustering); (iii) we show that the method is able to discover interpretable subcategories in various datasets. The model is evaluated experimentally over various datasets, and its performance advantages over kmeans and Latent SVM are demonstrated. We also stress test the model and show its resilience in discovering subcategories as the parameters are varied. 1.
A Family of Simple NonParametric Kernel Learning Algorithms
"... Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver cou ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn nonparametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closedform solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be reformulated as a saddlepoint optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.