Results 1  10
of
20
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract

Cited by 97 (18 self)
 Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large datasets.
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
The benefit of group sparsity
, 2009
"... This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying ..."
Abstract

Cited by 67 (6 self)
 Add to MetaCart
This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies. 1
Learning with Structured Sparsity
"... This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A gener ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. Moreover, a structured greedy algorithm is proposed to efficiently solve the structured sparsity problem. Experiments demonstrate the advantage of structured sparsity over standard sparsity. 1.
HIGHDIMENSIONAL ISING MODEL SELECTION USING ℓ1REGULARIZED LOGISTIC REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is ..."
Abstract

Cited by 40 (13 self)
 Add to MetaCart
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is analyzed under highdimensional scaling, in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n = Ω(d 2 log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
Estimation of (near) lowrank matrices with noise and highdimensional scaling
"... We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Me ..."
Abstract

Cited by 38 (11 self)
 Add to MetaCart
We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Mestimator based on regularization by the traceornuclearnormovermatrices, andanalyze its performance under highdimensional scaling. We provide nonasymptotic bounds on the Frobenius norm error that hold for a generalclassofnoisyobservationmodels,and apply to both exactly lowrank and approximately lowrank matrices. We then illustrate their consequences for a number of specific learning models, including lowrank multivariate or multitask regression, system identification in vector autoregressive processes, and recovery of lowrank matrices from random projections. Simulations show excellent agreement with the highdimensional scaling of the error predicted by our theory. 1.
Simultaneous support recovery in high dimensions: Benefits and perils of block ℓ1/ℓ∞regularization
, 2010
"... ..."
Taking Advantage of Sparsity in MultiTask Learning
"... We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in [3, 19]. In particular, in the multitask learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite. 1 1
The Bene t of Group Sparsity
, 901
"... This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justi cation for using group sparse regularization when the underlying ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly groupsparse signals. This provides a convincing theoretical justi cation for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are con rmed by simulation studies. 1