Results 1  10
of
18
Convergence Rates of Inexact ProximalGradient Methods for Convex Optimization
 NIPS'11 25 TH ANNUAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that b ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that both the basic proximalgradient method and the accelerated proximalgradient method achieve the same convergence rate as in the errorfree case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.
Structured Sparsity Models for Brain Decoding from fMRI data
"... Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net method ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net methods. However, despite sparsity has often been advocated as leading to more interpretable models, we show that by itself sparsity and also structured sparsity could lead to unstable models. We present an extension of the Total Variation method and assess several other structured sparsity models on accuracy, sparsity and stability. Our results indicate that structured sparsity via the Sparse Total Variation can mitigate some of the instability inherent in simpler sparse methods, but more research is required to build methods that can reliably infer relevant activation patterns from fMRI data. Keywordsbrain decoding; structured sparsity; stability; fMRI I.
A general framework for structured sparsity via proximal optimization
 In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS
"... We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patte ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patterns, such as orderings, contiguous regions and overlapping groups, among others. Available optimization methods are limited to specific constraint sets and tend to not scale well with sample size and dimensionality. We propose a first order proximal method, which builds upon results on fixed points and successive approximations. The algorithm can be applied to a general class of conic and norm constraints sets and relies on a proximity operator subproblem which can be computed numerically. Experiments on different regression problems demonstrate stateoftheart statistical performance, which improves over Lasso, Group Lasso and StructOMP. They also demonstrate the efficiency of the optimization algorithm and its scalability with the size of the problem. 1
Convergence of stochastic proximal gradient algorithm. arXiv:1403.5074
, 2014
"... ar ..."
(Show Context)
approximation and faster algorithm using the proximal average
 In Advances in Neural Information Processing Systems
"... It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and p ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and point out a nonsmooth approximation which simply pretends the linearity of the proximal map. The new approximation is justified using a recent convex analysis tool— proximal average, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead. Numerical experiments conducted on two important applications, overlapping group lasso and graphguided fused lasso, corroborate the theoretical claims. 1
An inertial forwardbackward algorithm for monotone inclusions
 J. Math. Imaging Vis
, 2014
"... In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being cocoercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger clas ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being cocoercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convexconcave saddle point problems and general monotone inclusions. We prove convergence of the algorithm in a Hilbert space setting and show that several recently proposed firstorder methods can be obtained as special cases of the general algorithm. Numerical results show that the proposed algorithm converges faster than existing methods, while keeping the computational cost of each iteration basically unchanged. 1
Spectral Norm Regularization of Orthonormal Representations for
"... Recent literature [1] suggests that embedding a graph on an unit sphere leads to better generalization for graph transduction. However, the choice of optimal embedding and an efficient algorithm to compute the same remains open. In this paper, we show that orthonormal representations, a class of un ..."
Abstract
 Add to MetaCart
(Show Context)
Recent literature [1] suggests that embedding a graph on an unit sphere leads to better generalization for graph transduction. However, the choice of optimal embedding and an efficient algorithm to compute the same remains open. In this paper, we show that orthonormal representations, a class of unitsphere graph embeddings are PAC learnable. Existing PACbased analysis do not apply as the VC dimension of the function class is infinite. We propose an alternative PACbased bound, which do not depend on the VC dimension of the underlying function class, but is related to the famous Lovász ϑ function. The main contribution of the paper is SPORE, a SPectral regularized ORthonormal Embedding for graph transduction, derived from the PAC bound. SPORE is posed as a nonsmooth convex function over an elliptope. These problems are usually solved as semidefinite programs (SDPs) with time complexity O(n6). We present, Infeasible Inexact proximal (IIP): an Inexact proximal method which performs subgradient procedure on an approximate projection, not necessarily feasible. IIP is more scalable than SDP, has an O ( 1√ T) convergence, and is generally applicable whenever a suitable approximate projection is available. We use IIP to compute SPORE where the approximate projection step is computed by FISTA, an accelerated gradient descent procedure. We show that the method has a convergence rate of O ( 1√ T The proposed algorithm easily scales to 1000’s of vertices, while the standard SDP computation does not scale beyond few hundred vertices. Furthermore, the analysis presented here easily extends to the multiple graph setting. 1
MITCSAILTR2011041 CBCL303
, 2011
"... In this work we are interested in the problems of supervised learning and variable selection when the inputoutput dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key ide ..."
Abstract
 Add to MetaCart
In this work we are interested in the problems of supervised learning and variable selection when the inputoutput dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose and study a new regularizer and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the stateoftheart. 1