Results 1  10
of
18
Proximal Methods for Hierarchical Sparse Coding
, 2010
"... Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularizatio ..."
Abstract

Cited by 87 (21 self)
 Add to MetaCart
(Show Context)
Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the treestructured sparse approximation problem at the same computational cost as traditional ones using the ℓ1norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.
Network Flow Algorithms for Structured Sparsity
"... We consider a class of learning problems that involve a structured sparsityinducing norm defined as the su mof ℓ∞norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structur ..."
Abstract

Cited by 57 (16 self)
 Add to MetaCart
We consider a class of learning problems that involve a structured sparsityinducing norm defined as the su mof ℓ∞norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic mincost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems.
Structured Sparsity through Convex Optimization
"... Abstract. Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we cons ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
Abstract. Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection. Key words and phrases: Sparsity, convex optimization. 1.
CHiLasso: A collaborative hierarchical sparse modeling framework
, 2010
"... Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an ℓ1regularized linear regression problem, commonly referred to as Lasso or basis pursuit. In this work we combine the sparsityinducing property of the Lasso ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an ℓ1regularized linear regression problem, commonly referred to as Lasso or basis pursuit. In this work we combine the sparsityinducing property of the Lasso model at the individual feature level, with the blocksparsity property of the Group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the Hierarchical Lasso (HiLasso), which shows important practical modeling advantages. We then extend this approach to the collaborative case, where a set of simultaneously coded signals share the same sparsity pattern at the higher (group) level, but not necessarily at the lower (inside the group) level, obtaining the collaborative HiLasso model (CHiLasso). Such signals then share the same active groups, or classes, but not necessarily the same active set. This model is very well suited for applications such as source identification and separation. An efficient optimization procedure, which guarantees convergence to the global optimum, is developed for these new models. The underlying presentation of the new framework and optimization approach is complemented with experimental examples and theoretical results regarding recovery guarantees for the proposed models.
Convex and network flow optimization for structured sparsity
 JMLR
, 2011
"... We consider a class of learning problems regularized by a structured sparsityinducing norm defined as the sum of ℓ2 or ℓ∞norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address ..."
Abstract

Cited by 35 (9 self)
 Add to MetaCart
We consider a class of learning problems regularized by a structured sparsityinducing norm defined as the sum of ℓ2 or ℓ∞norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of ℓ∞norms can be computed exactly in polynomial time by solving a quadratic mincost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with nonoverlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multitask learning of treestructured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches.
A MULTIDIMENSIONAL SHRINKAGETHRESHOLDING OPERATOR
"... The scalar shrinkagethresholding operator (SSTO) is a key ingredient of many modern statistical signal processing algorithms including: sparse inverse problem solutions, wavelet denoising, and JPEG2000 image compression. In these applications, it is customary to select the threshold of the operator ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
(Show Context)
The scalar shrinkagethresholding operator (SSTO) is a key ingredient of many modern statistical signal processing algorithms including: sparse inverse problem solutions, wavelet denoising, and JPEG2000 image compression. In these applications, it is customary to select the threshold of the operator by solving a scalar sparsity penalized quadratic optimization. In this work, we present a natural multidimensional extension of the scalar shrinkage thresholding operator. Similarly to the scalar case, the threshold is determined by the minimization of a convex quadratic form plus an euclidean penalty, however, here the optimization is performed over a domain of dimension N ≥ 1. The solution to this convex optimization problem is called the multidimensional shrinkage threshold operator (MSTO). The MSTO reduces to the standard SSTO in the special case of N = 1. In the general case of N> 1 the optimal MSTO threshold can be found by a simple convex line search. We present three illustrative applications of the MSTO in the context of nonlinear regression: l2penalized linear regression, Group LASSO linear regression and Group LASSO logistic regression.
USPACOR: UNIVERSAL SPARSITYCONTROLLING OUTLIER REJECTION ∗
"... The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present paper introduces a neat link between sparsity and a fundamental aspect of statistical inferenc ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present paper introduces a neat link between sparsity and a fundamental aspect of statistical inference, namely that of robustness against outliers, even when the signals involved are not sparse. It is argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. Analysis, comparisons, and corroborating simulations focus on robustifying linear regression, but succinct overview of other areas is provided to highlight universality of the novel framework. Index Terms—Robustness, outlier rejection, sparsity, Lasso 1.
Generalized sparse regularization with application to fmri brain decoding
 In: Székely, G., Hahn, H.K. (Eds
, 2011
"... Abstract. Many current medical image analysis problems involve learning thousands or even millions of model parameters from extremely few samples. Employing sparse models provides an effective means for handling the curse of dimensionality, but other propitious properties beyond sparsity are typical ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Many current medical image analysis problems involve learning thousands or even millions of model parameters from extremely few samples. Employing sparse models provides an effective means for handling the curse of dimensionality, but other propitious properties beyond sparsity are typically not modeled. In this paper, we propose a simple approach, generalized sparse regularization (GSR), for incorporating domainspecific knowledge into a wide range of sparse linear models, such as the LASSO and group LASSO regression models. We demonstrate the power of GSR by building anatomicallyinformed sparse classifiers that additionally model the intrinsic spatiotemporal characteristics of brain activity for fMRI classification. We validate on real data and show how priorinformed sparse classifiers outperform standard classifiers, such as SVM and a number of sparse linear classifiers, both in terms of prediction accuracy and result interpretability. Our results illustrate the addedvalue in facilitating flexible integration of prior knowledge beyond sparsity in largescale model learning problems.
Sparse overlapping sets lasso for multitask learning and fmri data analysis
 Neural Information Processing Systems
, 2013
"... Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available feat ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1