Results 1  10
of
49
Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise
, 2006
"... This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that ..."
Abstract

Cited by 312 (1 self)
 Add to MetaCart
This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
A Generalized Maximum Entropy Approach to Bregman Coclustering and Matrix Approximation
 In KDD
, 2004
"... Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclust ..."
Abstract

Cited by 101 (24 self)
 Add to MetaCart
Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclustering of more general matrices is desired. In this paper, we present a substantially generalized coclustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Analysis of the coclustering problem leads to the minimum Bregman information principle, which generalizes the maximum entropy principle, and yields an elegant meta algorithm that is guaranteed to achieve local optimality. Our methodology yields new algorithms and also encompasses several previously known clustering and coclustering algorithms based on alternate minimization.
Tracking the Best Linear Predictor
 Journal of Machine Learning Research
, 2001
"... In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total online loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm wellbehaved and the static bounds can be converted to shifting bounds.
Messagepassing for graphstructured linear programs: Proximal methods and rounding schemes
, 2008
"... The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov random fields. A line of work has focused on “treebased ” linear programming (LP) relaxations for the MAP problem. This paper develops a family of superlinearly convergen ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov random fields. A line of work has focused on “treebased ” linear programming (LP) relaxations for the MAP problem. This paper develops a family of superlinearly convergent algorithms for solving these LPs, based on proximal minimization schemes using Bregman divergences. As with standard messagepassing on graphs, the algorithms are distributed and exploit the underlying graphical structure, and so scale well to large problems. Our algorithms have a doubleloop character, with the outer loop corresponding to the proximal sequence, and an inner loop of cyclic Bregman divergences used to compute each proximal update. Different choices of the Bregman divergence lead to conceptually related but distinct LPsolving algorithms. We establish convergence guarantees for our algorithms, and illustrate their performance via some simulations. We also develop two classes of graphstructured rounding schemes, randomized and deterministic, for obtaining integral configurations from the LP solutions. Our deterministic rounding schemes use a “reparameterization ” property of our algorithms so that when the LP solution is integral, the MAP solution can be obtained even before the LPsolver converges to the optimum. We also propose a graphstructured randomized rounding scheme that applies to iterative LP solving algorithms in general. We analyze the performance of our rounding schemes, giving bounds on the number of iterations required, when the LP is integral, for the rounding schemes to obtain the MAP solution. These bounds are expressed in terms of the strength of the potential functions, and the energy gap, which measures how well the integral MAP solution is separated from other integral configurations. We also report simulations comparing these rounding schemes. 1
An Inexact Hybrid Generalized Proximal Point Algorithm And Some New Results On The Theory Of Bregman Functions
 Mathematics of Operations Research
, 2000
"... We present a new Bregmanfunctionbased algorithm which is a modification of the generalized proximal point method for solving the variational inequality problem with a maximal monotone operator. The principal advantage of the presented algorithm is that it allows a more constructive error tolerance ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
We present a new Bregmanfunctionbased algorithm which is a modification of the generalized proximal point method for solving the variational inequality problem with a maximal monotone operator. The principal advantage of the presented algorithm is that it allows a more constructive error tolerance criterion in solving the proximal point subproblems. Furthermore, we eliminate the assumption of pseudomonotonicity which was, until now, standard in proving convergence for paramonotone operators. Thus we obtain a convergence result which is new even for exact generalized proximal point methods. Finally, we present some new results on the theory of Bregman functions. For example, we show that the standard assumption of convergence consistency is a consequence of the other properties of Bregman functions, and is therefore superfluous.
Duality and Auxiliary Functions for Bregman Distances
 School of Computer Science, Carnegie Mellon University
, 2002
"... We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints. ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints.
Approximate iterations in Bregmanfunctionbased proximal algorithms
 Math. Program
, 1998
"... This paper establishes convergence of generalized Bregmanfunctionbased proximal point algorithms when the iterates are computed only approximately. The problem being solved is modeled as a general maximal monotone operator, and need not reduce to minimization of a function. The accuracy conditions ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
This paper establishes convergence of generalized Bregmanfunctionbased proximal point algorithms when the iterates are computed only approximately. The problem being solved is modeled as a general maximal monotone operator, and need not reduce to minimization of a function. The accuracy conditions on the iterates resemble those required for the classical "linear" proximal point algorithm, but are slightly stronger; they should be easier to verify or enforce in practice than conditions given in earlier analyses of approximate generalized proximal methods. Subjects to these practically enforceable accuracy restrictions, convergence is obtained under the same conditions currently established for exact Bregrnanfunctionbased
Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces
 COMM. CONTEMP. MATH
, 2001
"... The classical notions of essential smoothness, essential strict convexity, and Legendreness for convex functions are extended from Euclidean to Banach spaces. A pertinent duality theory is developed and several useful characterizations are given. The proofs rely on new results on the more subtle beh ..."
Abstract

Cited by 18 (13 self)
 Add to MetaCart
The classical notions of essential smoothness, essential strict convexity, and Legendreness for convex functions are extended from Euclidean to Banach spaces. A pertinent duality theory is developed and several useful characterizations are given. The proofs rely on new results on the more subtle behavior of subdifferentials and directional derivatives at boundary points of the domain. In weak Asplund spaces, a new formula allows the recovery of the subdifferential from nearby gradients. Finally, it is shown that every Legendre function on a reflexive Banach space is zone consistent, a fundamental property in the analysis of optimization algorithms based on Bregman distances. Numerous illustrating examples are provided.
Sided and symmetrized Bregman centroids
 IEEE Transactions on Information Theory
, 2009
"... Abstract—In this paper, we generalize the notions of centroids (and barycenters) to the broad class of informationtheoretic distortion measures called Bregman divergences. Bregman divergences form a rich and versatile family of distances that unifies quadratic Euclidean distances with various well ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Abstract—In this paper, we generalize the notions of centroids (and barycenters) to the broad class of informationtheoretic distortion measures called Bregman divergences. Bregman divergences form a rich and versatile family of distances that unifies quadratic Euclidean distances with various wellknown statistical entropic measures. Since besides the squared Euclidean distance, Bregman divergences are asymmetric, we consider the leftsided and rightsided centroids and the symmetrized centroids as minimizers of average Bregman distortions. We prove that all three centroids are unique and give closedform solutions for the sided centroids that are generalized means. Furthermore, we design a provably fast and efficient arbitrary close approximation algorithm for the symmetrized centroid based on its exact geometric characterization. The geometric approximation algorithm requires only to walk on a geodesic linking the two left/rightsided centroids. We report on our implementation for computing entropic centers of image histogram clusters and entropic centers of multivariate normal distributions that are useful operations for processing multimedia information and retrieval. These experiments illustrate that our generic methods compare favorably with former limited ad hoc methods. Index Terms—Bregman divergence, Bregman information, Bregman power divergence, Burbea–Rao divergence, centroid,