Results 1 - 10
of
18
Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
- NIPS'11- 25 TH ANNUAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that b ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
(Show Context)
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.
Structured Sparsity Models for Brain Decoding from fMRI data
"... Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net method ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Structured sparsity methods have been recently proposed that allow to incorporate additional spatial and temporal information for estimating models for decoding mental states from fMRI data. These methods carry the promise of being more interpretable than simpler Lasso or Elastic Net methods. However, despite sparsity has often been advocated as leading to more interpretable models, we show that by itself sparsity and also structured sparsity could lead to unstable models. We present an extension of the Total Variation method and assess several other structured sparsity models on accuracy, sparsity and stability. Our results indicate that structured sparsity via the Sparse Total Variation can mitigate some of the instability inherent in simpler sparse methods, but more research is required to build methods that can reliably infer relevant activation patterns from fMRI data. Keywords-brain decoding; structured sparsity; stability; fMRI I.
A general framework for structured sparsity via proximal optimization
- In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS
"... We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patte ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
We study a generalized framework for structured sparsity. It extends the well known methods of Lasso and Group Lasso by incorporating additional constraints on the variables as part of a convex optimization problem. This framework provides a straightforward way of favouring prescribed sparsity patterns, such as orderings, contiguous regions and overlapping groups, among others. Available optimization methods are limited to specific constraint sets and tend to not scale well with sample size and dimensionality. We propose a first order proximal method, which builds upon results on fixed points and successive approximations. The algorithm can be applied to a general class of conic and norm constraints sets and relies on a proximity operator subproblem which can be computed numerically. Experiments on different regression problems demonstrate state-of-the-art statistical performance, which improves over Lasso, Group Lasso and StructOMP. They also demonstrate the efficiency of the optimization algorithm and its scalability with the size of the problem. 1
approximation and faster algorithm using the proximal average
- In Advances in Neural Information Processing Systems
"... It is a common practice to approximate “complicated ” functions with more friendly ones. In large-scale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approxi-mated by smooth functions. We re-examine this powerful methodology and p ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
It is a common practice to approximate “complicated ” functions with more friendly ones. In large-scale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approxi-mated by smooth functions. We re-examine this powerful methodology and point out a nonsmooth approximation which simply pretends the linearity of the proxi-mal map. The new approximation is justified using a recent convex analysis tool— proximal average, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead. Nu-merical experiments conducted on two important applications, overlapping group lasso and graph-guided fused lasso, corroborate the theoretical claims. 1
An inertial forward-backward algorithm for monotone inclusions
- J. Math. Imaging Vis
, 2014
"... In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger clas ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a new accelerated forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convex-concave saddle point problems and general monotone inclusions. We prove convergence of the algorithm in a Hilbert space setting and show that several recently proposed first-order methods can be obtained as special cases of the general algorithm. Numerical results show that the proposed algorithm converges faster than existing methods, while keeping the computational cost of each iteration basically unchanged. 1
Accelerated Stochastic Gradient Method for Composite Regularization
, 2014
"... Regularized risk minimization often involves nonsmooth optimization. This can be par-ticularly challenging when the regularizer is a sum of simpler regularizers, as in the over-lapping group lasso. Very recently, this is alleviated by using the proximal average, in which an implicitly nonsmooth func ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Regularized risk minimization often involves nonsmooth optimization. This can be par-ticularly challenging when the regularizer is a sum of simpler regularizers, as in the over-lapping group lasso. Very recently, this is alleviated by using the proximal average, in which an implicitly nonsmooth function is employed to approximate the composite reg-ularizer. In this paper, we propose a novel extension with accelerated gradient method for stochastic optimization. On both gener-al convex and strongly convex problems, the resultant approximation errors reduce at a faster rate than methods based on stochastic smoothing and ADMM. This is also verified experimentally on a number of synthetic and real-world data sets.