Results 11 - 20
of
1,758
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract
-
Cited by 125 (3 self)
- Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Fast Linear Iterations for Distributed Averaging
- Systems and Control Letters
, 2003
"... We consider the problem of finding a linear iteration that yields distributed averaging consensus over a network, i.e., that asymptotically computes the average of some initial values given at the nodes. When the iteration is assumed symmetric, the problem of finding the fastest converging linear ..."
Abstract
-
Cited by 120 (10 self)
- Add to MetaCart
We consider the problem of finding a linear iteration that yields distributed averaging consensus over a network, i.e., that asymptotically computes the average of some initial values given at the nodes. When the iteration is assumed symmetric, the problem of finding the fastest converging linear iteration can be cast as a semidefinite program, and therefore efficiently and globally solved. These optimal linear iterations are often substantially faster than several common heuristics that are based on the Laplacian of the associated graph.
A direct formulation for sparse pca using semidefinite programming
- In NIPS 17
, 2004
"... Abstract. Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This problem arises in the decomposition of a covariance matrix into ..."
Abstract
-
Cited by 115 (28 self)
- Add to MetaCart
Abstract. Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This problem arises in the decomposition of a covariance matrix into sparse factors or sparse principal component analysis (PCA), and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming–based relaxation for our problem. We also discuss Nesterov’s smooth minimization technique applied to the semidefinite program arising in the semidefinite relaxation of the sparse PCA problem. The method has complexity O(n 4 √ log(n)/ɛ), where n is the size of the underlying covariance matrix and ɛ is the desired absolute accuracy on the optimal value of the problem.
Randomized Gossip Algorithms
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Motivated by applications to sensor, peer-to-peer, and ad hoc networks, we study distributed algorithms, also known as gossip algorithms, for exchanging information and for computing in an arbitrarily connected network of nodes. The topology of such networks changes continuously as new nodes join a ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
Motivated by applications to sensor, peer-to-peer, and ad hoc networks, we study distributed algorithms, also known as gossip algorithms, for exchanging information and for computing in an arbitrarily connected network of nodes. The topology of such networks changes continuously as new nodes join and old nodes leave the network. Algorithms for such networks need to be robust against changes in topology. Additionally, nodes in sensor networks operate under limited computational, communication, and energy resources. These constraints have motivated the design of “gossip ” algorithms: schemes which distribute the computational burden and in which a node communicates with a randomly chosen neighbor. We analyze the averaging problem under the gossip constraint for an arbitrary network graph, and find that the averaging time of a gossip algorithm depends on the second largest eigenvalue of a doubly stochastic matrix characterizing the algorithm. Designing the fastest gossip algorithm corresponds to minimizing this eigenvalue, which is a semidefinite program (SDP). In general, SDPs cannot be solved in a distributed fashion; however, exploiting problem structure, we propose a distributed subgradient method that solves the optimization problem over the network. The relation of averaging time to the second largest eigenvalue naturally relates it to the mixing time of a random walk with transition probabilities derived from the gossip algorithm. We use this connection to study the performance and scaling of gossip algorithms on two popular networks: Wireless Sensor Networks, which are modeled as Geometric Random Graphs, and the Internet graph under the so-called Preferential Connectivity (PC) model.
Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization
, 2007
"... The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative ..."
Abstract
-
Cited by 100 (5 self)
- Add to MetaCart
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard, because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is sufficiently large. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
Probing the Pareto frontier for basis pursuit solutions
, 2008
"... The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the ..."
Abstract
-
Cited by 95 (0 self)
- Add to MetaCart
The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions
, 2004
"... In this paper, we develop a robust uncertainty principle for finite signals in C N which states that for nearly all choices T, Ω ⊂ {0,..., N − 1} such that |T | + |Ω | ≍ (log N) −1/2 · N, there is no signal f supported on T whose discrete Fourier transform ˆ f is supported on Ω. In fact, we can mak ..."
Abstract
-
Cited by 90 (8 self)
- Add to MetaCart
In this paper, we develop a robust uncertainty principle for finite signals in C N which states that for nearly all choices T, Ω ⊂ {0,..., N − 1} such that |T | + |Ω | ≍ (log N) −1/2 · N, there is no signal f supported on T whose discrete Fourier transform ˆ f is supported on Ω. In fact, we can make the above uncertainty principle quantitative in the sense that if f is supported on T, then only a small percentage of the energy (less than half, say) of ˆ f is concentrated on Ω. As an application of this robust uncertainty principle (QRUP), we consider the problem of decomposing a signal into a sparse superposition of spikes and complex sinusoids f(s) = � α1(t)δ(s − t) + � α2(ω)e i2πωs/N / √ N. t∈T We show that if a generic signal f has a decomposition (α1, α2) using spike and frequency locations in T and Ω respectively, and obeying ω∈Ω |T | + |Ω | ≤ Const · (log N) −1/2 · N, then (α1, α2) is the unique sparsest possible decomposition (all other decompositions have more non-zero terms). In addition, if |T | + |Ω | ≤ Const · (log N) −1 · N, then the sparsest (α1, α2) can be found by solving a convex optimization problem. Underlying our results is a new probabilistic approach which insists on finding the correct uncertainty relation or the optimally sparse solution for nearly all subsets but not necessarily all of them, and allows to considerably sharpen previously known results [9, 10]. In fact, we show that the fraction of sets (T, Ω) for which the above properties do not hold can be upper bounded by quantities like N −α for large values of α. The QRUP (and the application to finding sparse representations) can be extended to general pairs of orthogonal bases Φ1, Φ2 of C N. For nearly all choices Γ1, Γ2 ⊂ {0,..., N − 1} obeying |Γ1 | + |Γ2 | ≍ µ(Φ1, Φ2) −2 · (log N) −m, where m ≤ 6, there is no signal f such that Φ1f is supported on Γ1 and Φ2f is supported on Γ2 where µ(Φ1, Φ2) is the mutual coherence between Φ1 and Φ2.
Metric Learning by Collapsing Classes
"... We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in th ..."
Abstract
-
Cited by 84 (2 self)
- Add to MetaCart
We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.
Consistency of the group lasso and multiple kernel learning
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it ..."
Abstract
-
Cited by 81 (14 self)
- Add to MetaCart
We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Learning the discriminative powerinvariance trade-off
- In ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract
-
Cited by 80 (3 self)
- Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this trade-off must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve state-of-the-art performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.

