Results 1  10
of
163
Group Lasso with Overlap and Graph Lasso
"... We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be ..."
Abstract

Cited by 113 (13 self)
 Add to MetaCart
We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be connected to each other when a graph of covariates is given. We study theoretical properties of the estimator, and illustrate its behavior on simulated and breast cancer gene expression data. 1.
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract

Cited by 110 (20 self)
 Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large datasets.
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 96 (17 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 78 (18 self)
 Add to MetaCart
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
The benefit of group sparsity
, 2009
"... This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying ..."
Abstract

Cited by 64 (6 self)
 Add to MetaCart
This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies. 1
TreeGuided Group Lasso for MultiTask Regression with Structured Sparsity
"... We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each outp ..."
Abstract

Cited by 61 (9 self)
 Add to MetaCart
We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this problem as a new multitask regularized regression called treeguided group lasso. Our structured regularization is based on a grouplasso penalty, where groups are defined with respect to the tree structure. We describe a systematic weighting scheme for the groups in the penalty such that each output variable is penalized in a balanced manner even if the groups overlap. We present an efficient optimization method that can handle a largescale problem. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns compared to other methods for multitask learning. 1.
Learning with Structured Sparsity
"... This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A gener ..."
Abstract

Cited by 59 (6 self)
 Add to MetaCart
This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. Moreover, a structured greedy algorithm is proposed to efficiently solve the structured sparsity problem. Experiments demonstrate the advantage of structured sparsity over standard sparsity. 1.
Blocksparse signals: Uncertainty relations and efficient recovery
 IEEE TRANS. SIGNAL PROCESS
, 2010
"... We consider efficient methods for the recovery of blocksparse signals — i.e., sparse signals that have nonzero entries occurring in clusters—from an underdetermined system of linear equations. An uncertainty relation for blocksparse signals is derived, based on a blockcoherence measure, which we ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
We consider efficient methods for the recovery of blocksparse signals — i.e., sparse signals that have nonzero entries occurring in clusters—from an underdetermined system of linear equations. An uncertainty relation for blocksparse signals is derived, based on a blockcoherence measure, which we introduce. We then show that a blockversion of the orthogonal matching pursuit algorithm recovers block ksparse signals in no more than k steps if the blockcoherence is sufficiently small. The same condition on blockcoherence is shown to guarantee successful recovery through a mixed `2=`1optimization approach. This complements previous recovery results for the blocksparse case which relied on small blockrestricted isometry constants. The significance of the results presented in this paper lies in the fact that making explicit use of blocksparsity can provably yield better reconstruction properties than treating the signal as being sparse in the conventional sense, thereby ignoring the additional structure in the problem.