Results 1  10
of
20
The BurbeaRao and Bhattacharyya centroids
 IEEE Transactions on Information Theory
, 2010
"... Abstract—We study the centroid with respect to the class of informationtheoretic BurbeaRao divergences that generalize the celebrated JensenShannon divergence by measuring the nonnegative Jensen difference induced by a strictly convex and differentiable function. Although those BurbeaRao diverge ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Abstract—We study the centroid with respect to the class of informationtheoretic BurbeaRao divergences that generalize the celebrated JensenShannon divergence by measuring the nonnegative Jensen difference induced by a strictly convex and differentiable function. Although those BurbeaRao divergences are symmetric by construction, they are not metric since they fail to satisfy the triangle inequality. We first explain how a particular symmetrization of Bregman divergences called JensenBregman distances yields exactly those BurbeaRao divergences. We then proceed by defining skew BurbeaRao divergences, and show that skew BurbeaRao divergences amount in limit cases to compute Bregman divergences. We then prove that BurbeaRao centroids can be arbitrarily finely approximated by a generic iterative concaveconvex optimization algorithm with guaranteed convergence property. In the second part of the paper, we consider the Bhattacharyya distance that is commonly used to measure overlapping degree of probability distributions. We show that Bhattacharyya distances on members of the same statistical exponential family amount to calculate a BurbeaRao divergence in disguise. Thus we get an efficient algorithm for computing the Bhattacharyya centroid of a set of parametric distributions belonging to the same exponential families, improving over former specialized methods found in the literature that were limited to univariate or “diagonal ” multivariate Gaussians. To illustrate the performance of our Bhattacharyya/BurbeaRao centroid algorithm, we present experimental performance results for kmeans and hierarchical clustering methods of Gaussian mixture models.
Learning multiview neighborhood preserving projections
 In Proc. of the the International Conference on Machine Learning (ICML
, 2011
"... We address the problem of metric learning for multiview data, namely the construction of embedding projections from data in different representations into a shared feature space, such that the Euclidean distance in this space provides a meaningful withinview as well as betweenview similarity. Our ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We address the problem of metric learning for multiview data, namely the construction of embedding projections from data in different representations into a shared feature space, such that the Euclidean distance in this space provides a meaningful withinview as well as betweenview similarity. Our motivation stems from the problem of crossmedia retrieval tasks, where the availability of a joint Euclidean distance function is a prerequisite to allow fast, in particular hashingbased, nearest neighbor queries. We formulate an objective function that expresses the intuitive concept that matching samples are mapped closely together in the output space, whereas nonmatching samples are pushed apart, no matter in which view they are available. The resulting optimization problem is not convex, but it can be decomposed explicitly into a convex and a concave part, thereby allowing efficient optimization using the convexconcave procedure. Experiments on an image retrieval task show that nearestneighbor based crossview retrieval is indeed possible, and the proposed technique improves the retrieval accuracy over baseline techniques. 1.
Multitask Learning without Label Correspondences
"... We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1
Maxmargin minentropy models
 In AISTATS
, 2012
"... We propose a new family of latent variable models called maxmargin minentropy (m3e) models, which define a distribution over the output and the hidden variables conditioned on the input. Given an input, an m3e model predicts the output with the smallest corresponding Rényi entropy of generalized d ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We propose a new family of latent variable models called maxmargin minentropy (m3e) models, which define a distribution over the output and the hidden variables conditioned on the input. Given an input, an m3e model predicts the output with the smallest corresponding Rényi entropy of generalized distribution. This is equivalent to minimizing a score that consists of two terms: (i) the negative loglikelihood of the output, ensuring that the output has a high probability; and (ii) a measure of uncertainty over the distribution of the hidden variables conditioned on the input and the output, ensuring that there is little confusion in the values of the hidden variables. Given a training dataset, the parameters of an m3e model are learned by maximizing the margin between the Rényi entropies of the groundtruth output and all other incorrect outputs. Training an m3e can be viewed as minimizing an upper bound on a userdefined loss, and includes, as a special case, the latent support vector machine framework. We demonstrate the efficacy of m3e models on two standard machine learning applications, discriminative motif finding and image classification, using publicly available datasets. 1
A Unified Optimization Framework for Robust Pseudorelevance Feedback Algorithms
"... We present a flexible new optimization framework for finding effective, reliable pseudorelevance feedback models that unifies existing complementary approaches in a principled way. The result is an algorithmic approach that not only brings together different benefits of previous methods, such as pa ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a flexible new optimization framework for finding effective, reliable pseudorelevance feedback models that unifies existing complementary approaches in a principled way. The result is an algorithmic approach that not only brings together different benefits of previous methods, such as parameter selftuning and risk reduction from term dependency modeling, but also allows a rich new space of model search strategies to be investigated. We compare the effectiveness of a unified algorithm to existing methods by examining iterative performance and riskreward tradeoffs. We also discuss extensions for generating new algorithms within our framework.
MessagePassing Algorithms for Quadratic Programming Formulations of MAP Estimation
"... Computing maximum a posteriori (MAP) estimation in graphical models is an important inference problem with many applications. We present messagepassing algorithms for quadratic programming (QP) formulations of MAP estimation for pairwise Markov random fields. In particular, we use the concaveconvex ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Computing maximum a posteriori (MAP) estimation in graphical models is an important inference problem with many applications. We present messagepassing algorithms for quadratic programming (QP) formulations of MAP estimation for pairwise Markov random fields. In particular, we use the concaveconvex procedure (CCCP) to obtain a locally optimal algorithm for the nonconvex QP formulation. A similar technique is used to derive a globally convergent algorithm for the convex QP relaxation of MAP. We also show that a recently developed expectationmaximization (EM) algorithm for the QP formulation of MAP can be derived from the CCCP perspective. Experiments on synthetic and realworld problems confirm that our new approach is competitive with maxproduct and its variations. Compared with CPLEX, we achieve more than an orderofmagnitude speedup in solving optimally the convex QP relaxation. 1
MessagePassing Algorithms for MAP Estimation Using DC Programming
"... We address the problem of finding the most likely assignment or MAP estimation in a Markov random field. We analyze the linear programming formulation of MAP through the lens of difference of convex functions (DC) programming, and use the concaveconvex procedure (CCCP) to develop efficient messagep ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We address the problem of finding the most likely assignment or MAP estimation in a Markov random field. We analyze the linear programming formulation of MAP through the lens of difference of convex functions (DC) programming, and use the concaveconvex procedure (CCCP) to develop efficient messagepassing solvers. The resulting algorithms are guaranteed to converge to a global optimum of the wellstudied local polytope, an outer bound on the MAP marginal polytope. To tighten the outer bound, we show how to combine it with the meanfield based inner bound and, again, solve it using CCCP. We also identify a useful relationship between the DC formulations and some recently proposed algorithms based on Bregman divergence. Experimentally, this hybrid approach produces optimal solutions for a range of hard OR problems and nearoptimal solutions for standard benchmarks. 1
Belief Propagation for Structured Decision Making
"... Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision ma ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized and decentralized (or multiagent) decisions. In this work, we present a general variational framework for solving structured cooperative decisionmaking problems, use it to propose several belief propagationlike algorithms, and analyze them both theoretically and empirically. 1
JensenBregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices
"... Covariance matrices have found success in several computer vision applications, including activity recognition, visual surveillance, and diffusion tensor imaging. This is because they provide an easy platform for fusing multiple features compactly. An important task in all of these applications is t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Covariance matrices have found success in several computer vision applications, including activity recognition, visual surveillance, and diffusion tensor imaging. This is because they provide an easy platform for fusing multiple features compactly. An important task in all of these applications is to compare two covariance matrices using a (dis)similarity function, for which the common choice is the Riemannian metric on the manifold inhabited by these matrices. As this Riemannian manifold is not flat, the dissimilarities should take into account the curvature of the manifold. As a result such distance computations tend to slow down, especially when the matrix dimensions are large or gradients are required. Further, suitability of the metric to enable efficient nearest neighbor retrieval is an important requirement in the contemporary times of big data analytics. To alleviate these difficulties, this paper proposes a novel dissimilarity measure for covariances, the JensenBregman LogDet Divergence (JBLD). This divergence enjoys several desirable theoretical properties, at the same time is computationally less demanding (compared to standard measures). Utilizing the fact that the squareroot of JBLD is a metric, we address the problem of efficient nearest neighbor retrieval on large covariance datasets via a metric tree data structure. To this end, we propose a KMeans clustering algorithm on JBLD. We demonstrate the superior performance of JBLD on covariance datasets from several computer vision applications.
SelfPaced Learning for Latent Variable Models
"... Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than conside ..."
Abstract
 Add to MetaCart
Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative selfpaced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the selfpaced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition. 1