Results 1  10
of
143
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 95 (17 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
TreeGuided Group Lasso for MultiTask Regression with Structured Sparsity
"... We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each outp ..."
Abstract

Cited by 61 (9 self)
 Add to MetaCart
We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this problem as a new multitask regularized regression called treeguided group lasso. Our structured regularization is based on a grouplasso penalty, where groups are defined with respect to the tree structure. We describe a systematic weighting scheme for the groups in the penalty such that each output variable is penalized in a balanced manner even if the groups overlap. We present an efficient optimization method that can handle a largescale problem. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns compared to other methods for multitask learning. 1.
Learning with Structured Sparsity
"... This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A gener ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. Moreover, a structured greedy algorithm is proposed to efficiently solve the structured sparsity problem. Experiments demonstrate the advantage of structured sparsity over standard sparsity. 1.
A simpler approach to matrix completion
 the Journal of Machine Learning Research
"... This paper provides the best bounds to date on the number of randomly sampled entries required to reconstruct an unknown low rank matrix. These results improve on prior work by Candès and Recht [4], Candès and Tao [7], and Keshavan, Montanari, and Oh [18]. The reconstruction is accomplished by minim ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
This paper provides the best bounds to date on the number of randomly sampled entries required to reconstruct an unknown low rank matrix. These results improve on prior work by Candès and Recht [4], Candès and Tao [7], and Keshavan, Montanari, and Oh [18]. The reconstruction is accomplished by minimizing the nuclear norm, or sum of the singular values, of the hidden matrix subject to agreement with the provided entries. If the underlying matrix satisfies a certain incoherence condition, then the number of entries required is equal to a quadratic logarithmic factor times the number of parameters in the singular value decomposition. The proof of this assertion is short, self contained, and uses very elementary analysis. The novel techniques herein are based on recent work in quantum information theory.
An Accelerated Gradient Method for Trace Norm Minimization
"... We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multitask learning, matrix classification, and matrix completion. The standard semidefinite programming formulati ..."
Abstract

Cited by 54 (5 self)
 Add to MetaCart
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multitask learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the nonsmooth nature of the trace norm, the optimal firstorder blackbox method for solving such class of problems converges as O ( 1 √), where k is the k iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O ( 1 k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O ( 1 k 2) for smooth problems. Experiments on multitask learning problems demonstrate the efficiency of the proposed algorithms. 1.
A new approach to collaborative filtering: Operator estimation with spectral regularization
 Journal of Machine Learning Research
"... We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our p ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularizationbased CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularizationbased CF approaches—using popular and wellknown kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on lowrank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularizationbased CF methods to incorporate related information about users and objects. Finally, we show that certain multitask learning methods can be also seen as special cases of our proposed approach.
A spectral regularization framework for multitask structure learning
 In NIPS
, 2008
"... Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, whi ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1
Blockwise Coordinate Descent Procedures for the Multitask Lasso, with Applications to Neural Semantic Basis Discovery
"... We develop a cyclical blockwise coordinate descent algorithm for the multitask Lasso that efficiently solves problems with thousands of features and tasks. The main result shows that a closedform Winsorization operator can be obtained for the supnorm penalized least squares regression. This allow ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
We develop a cyclical blockwise coordinate descent algorithm for the multitask Lasso that efficiently solves problems with thousands of features and tasks. The main result shows that a closedform Winsorization operator can be obtained for the supnorm penalized least squares regression. This allows the algorithm to find solutions to very largescale problems far more efficiently than existing methods. This result complements the pioneering work of Friedman, et al. (2007) for the singletask Lasso. As a case study, we use the multitask Lasso as a variable selector to discover a semantic basis for predicting human neural activation. The learned solution outperforms the standard basis for this task on the majority of test participants, while requiring far fewer assumptions about cognitive neuroscience. We demonstrate how this learned basis can yield insights into how the brain represents the meanings of words. 1.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
, 2009
"... We use convex relaxation techniques to provide a sequence of regularized lowrank solutions for largescale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the n ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
We use convex relaxation techniques to provide a sequence of regularized lowrank solutions for largescale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm SoftImpute iteratively replaces the missing elements with those obtained from a softthresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a lowrank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefiniteprogramming algorithm is readily scalable to large matrices: for example it can obtain a rank80 approximation of a 10 6 × 10 6 incomplete matrix with 10 5 observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive stateofthe art techniques. 1.