Results 1  10
of
29
Convex multitask feature learning
 MACHINE LEARNING
, 2007
"... We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove th ..."
Abstract

Cited by 203 (19 self)
 Add to MetaCart
(Show Context)
We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks.
A spectral regularization framework for multitask structure learning
 In NIPS
, 2008
"... Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, whi ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
(Show Context)
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1
An Algorithm for Transfer Learning in a Heterogeneous Environment
 ECML/PKDD
, 2008
"... We consider the problem of learning in an environment of classification tasks. Tasks sampled from the environment are used to improve classification performance on future tasks. We consider situations in which the tasks can be divided into groups. Tasks within each group are related by sharing a l ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning in an environment of classification tasks. Tasks sampled from the environment are used to improve classification performance on future tasks. We consider situations in which the tasks can be divided into groups. Tasks within each group are related by sharing a low dimensional representation, which differs across the groups. We present an algorithm which divides the sampled tasks into groups and computes a common representation for each group. We report experiments on a synthetic and two image data sets, which show the advantage of the approach over singletask learning and a previous transfer learning method.
Linear Algorithms for Online Multitask Classification
"... We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by int ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by interaction. Our online analysis gives new stimulating insights into previously known coregularization techniques, such as the multitask kernels and the margin correlation analysis for multiview learning. In the last part we apply our approach to spectral coregularization: we introduce a natural matrix extension of the quasiadditive algorithm for classification and prove bounds depending on certain unitarily invariant norms of the matrix of task coefficients. 1
Taking Advantage of Sparsity in MultiTask Learning
"... We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in [3, 19]. In particular, in the multitask learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite. 1 1
When is there a representer theorem? Vector vs matrix regularizers
 J. of Machine Learning Res
"... We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the inner product then the learned vector is a linear combination of the input data. This result, know ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the inner product then the learned vector is a linear combination of the input data. This result, known as the representer theorem, is at the basis of kernelbased methods in machine learning. In this paper, we prove the necessity of the above condition, thereby completing the characterization of kernel methods based on regularization. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multitask learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition for these class of matrix regularizers and highlight them with some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion Regularization in Hilbert spaces is an important methodology for learning from examples and has a long history in a variety of fields. It has been studied, from different perspectives, in statistics
On spectral learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2010
"... In this paper, we study the problem of learning a matrix W from a set of linear measurements. Our formulation consists in solving an optimization problem which involves regularization with a spectral penalty term. That is, the penalty term is a function of the spectrum of the covariance of W. Instan ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
In this paper, we study the problem of learning a matrix W from a set of linear measurements. Our formulation consists in solving an optimization problem which involves regularization with a spectral penalty term. That is, the penalty term is a function of the spectrum of the covariance of W. Instances of this problem in machine learning include multitask learning, collaborative filtering and multiview learning, among others. Our goal is to elucidate the form of the optimal solution of spectral learning. The theory of spectral learning relies on the von Neumann characterization of orthogonally invariant norms and their association with symmetric gauge functions. Using this tool we formulate a representer theorem for spectral regularization and specify it to several useful example, such as Schatten p−norms, trace norm and spectral norm, which should proved useful in applications.
The Rademacher complexity of linear transformation classes
 Proc. 19th Annual Conference on Learning Theory (COLT
, 2006
"... Abstract. Bounds are given for the empirical and expected Rademacher complexity of classes of linear transformations from a Hilbert space H to a nite dimensional space. The results imply generalization guarantees for graph regularization and multitask subspace learning. 1 ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Bounds are given for the empirical and expected Rademacher complexity of classes of linear transformations from a Hilbert space H to a nite dimensional space. The results imply generalization guarantees for graph regularization and multitask subspace learning. 1
Transfer bounds for linear feature learning
 Machine Learning
"... Abstract. If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a
nite sample of tasks, uniformly over a class of regularizing or preprocessing transformations. The bound is dimension free, jus ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a
nite sample of tasks, uniformly over a class of regularizing or preprocessing transformations. The bound is dimension free, justi
es optimization of the preprocessing featuremap and explains the circumstances under which learningtolearn is preferable to single task learning.1 1
Learning similarity with operatorvalued largemargin classifiers
, 1049
"... A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued largemargin classifiers to the setting of HilbertSchmidt operators leads to dimension free bounds on a risk functional for linear representat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued largemargin classifiers to the setting of HilbertSchmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition.