Results 1 -
4 of
4
Convergence rates of inexact proximal-gradient methods for convex optimization. arXiv:1109.2415v2
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems. 1
Author manuscript, published in "NIPS'11- 25 th Annual Conference on Neural Information Processing Systems (2011)" Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that b ..."
Abstract
- Add to MetaCart
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems. 1
Learning Hierarchical and Topographic Dictionaries with Structured Sparsity
"... Recent work in signal processing and statistics have focused on defining new regularization functions, which not only induce sparsity of the solution, but also take into account the structure of the problem. 1–7 We present in this paper a class of convex penalties introduced in the machine learning ..."
Abstract
- Add to MetaCart
Recent work in signal processing and statistics have focused on defining new regularization functions, which not only induce sparsity of the solution, but also take into account the structure of the problem. 1–7 We present in this paper a class of convex penalties introduced in the machine learning community, which take the form of a sum of ℓ2- and ℓ∞norms over groups of variables. They extend the classical group-sparsity regularization8–10 in the sense that the groups possibly overlap, allowing more flexibility in the group design. We review efficient optimization methods to deal with the corresponding inverse problems, 11–13 and their application to the problem of learning dictionaries of natural image patches: 14–18 On the one hand, dictionary learning has indeed proven effective for various signal processing tasks. 17, 19 On the other hand, structured sparsity provides a natural framework for modeling dependencies between dictionary elements. We thus consider a structured sparse regularization to learn dictionaries embedded in a particular structure, for instance a tree11 or a two-dimensional grid. 20 In the latter case, the results we obtain are similar to the dictionaries produced by topographic independent component analysis. 21
Asian Conference on Machine Learning Topographic Analysis of Correlated Components
"... Independent component analysis (ICA) is a method to estimate components which are as statistically independent as possible. However, in many practical applications, the estimated components are not independent. Recent variants of ICA have made use of such residual dependencies to estimate an orderin ..."
Abstract
- Add to MetaCart
Independent component analysis (ICA) is a method to estimate components which are as statistically independent as possible. However, in many practical applications, the estimated components are not independent. Recent variants of ICA have made use of such residual dependencies to estimate an ordering (topography) of the components. Like in ICA, the components in those variants are assumed to be uncorrelated, which might be a rather strict condition. In this paper, we address this shortcoming. We propose a generative model for the source where the components can have linear and higher order correlations, which generalizes models in use so far. Based on the model, we derive a method to estimate topographic representations. In numerical experiments on artificial data, the new method is shown to be more widely applicable than previously proposed extensions of ICA. We learn topographic representations for two kinds of real data sets: for outputs of simulated complex cells in the primary visual cortex and for text data.

