Results 1  10
of
18
Convergence Rates of Inexact ProximalGradient Methods for Convex Optimization
 NIPS'11 25 TH ANNUAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that b ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that both the basic proximalgradient method and the accelerated proximalgradient method achieve the same convergence rate as in the errorfree case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.
Structured Sparsity Models for Brain Decoding from fMRI data
"... Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net method ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net methods. However, despite sparsity has often been advocated as leading to more interpretable models, we show that by itself sparsity and also structured sparsity could lead to unstable models. We present an extension of the Total Variation method and assess several other structured sparsity models on accuracy, sparsity and stability. Our results indicate that structured sparsity via the Sparse Total Variation can mitigate some of the instability inherent in simpler sparse methods, but more research is required to build methods that can reliably infer relevant activation patterns from fMRI data. Keywordsbrain decoding; structured sparsity; stability; fMRI I.
A general framework for structured sparsity via proximal optimization
 In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS
"... We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patte ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patterns, such as orderings, contiguous regions and overlapping groups, among others. Available optimization methods are limited to specific constraint sets and tend to not scale well with sample size and dimensionality. We propose a first order proximal method, which builds upon results on fixed points and successive approximations. The algorithm can be applied to a general class of conic and norm constraints sets and relies on a proximity operator subproblem which can be computed numerically. Experiments on different regression problems demonstrate stateoftheart statistical performance, which improves over Lasso, Group Lasso and StructOMP. They also demonstrate the efficiency of the optimization algorithm and its scalability with the size of the problem. 1
approximation and faster algorithm using the proximal average
 In Advances in Neural Information Processing Systems
"... It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and point out a nonsmooth approximation which simply pretends the linearity of the proximal map. The new approximation is justified using a recent convex analysis tool— proximal average, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead. Numerical experiments conducted on two important applications, overlapping group lasso and graphguided fused lasso, corroborate the theoretical claims. 1
An inertial forwardbackward algorithm for monotone inclusions
 J. Math. Imaging Vis
, 2014
"... In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being cocoercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger clas ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being cocoercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convexconcave saddle point problems and general monotone inclusions. We prove convergence of the algorithm in a Hilbert space setting and show that several recently proposed firstorder methods can be obtained as special cases of the general algorithm. Numerical results show that the proposed algorithm converges faster than existing methods, while keeping the computational cost of each iteration basically unchanged. 1
Accelerated Stochastic Gradient Method for Composite Regularization
, 2014
"... Regularized risk minimization often involves nonsmooth optimization. This can be particularly challenging when the regularizer is a sum of simpler regularizers, as in the overlapping group lasso. Very recently, this is alleviated by using the proximal average, in which an implicitly nonsmooth func ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Regularized risk minimization often involves nonsmooth optimization. This can be particularly challenging when the regularizer is a sum of simpler regularizers, as in the overlapping group lasso. Very recently, this is alleviated by using the proximal average, in which an implicitly nonsmooth function is employed to approximate the composite regularizer. In this paper, we propose a novel extension with accelerated gradient method for stochastic optimization. On both general convex and strongly convex problems, the resultant approximation errors reduce at a faster rate than methods based on stochastic smoothing and ADMM. This is also verified experimentally on a number of synthetic and realworld data sets.