Results 1  10
of
158
Convex multitask feature learning
 Machine Learning
, 2007
"... Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We p ..."
Abstract

Cited by 144 (16 self)
 Add to MetaCart
Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks 3.
Bayesian Compressive Sensing
, 2007
"... The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing ..."
Abstract

Cited by 141 (15 self)
 Add to MetaCart
The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned Ndimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying Ndimensional signal. The number of required compressivesensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying Ndimensional signal, and g a vector of compressivesensing measurements, then one may approximate f accurately by utilizing knowledge of the (underdetermined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressivesensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
Multitask feature learning
 Advances in Neural Information Processing Systems 19
, 2007
"... We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that th ..."
Abstract

Cited by 135 (7 self)
 Add to MetaCart
We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacrosstasks representations and in the former step we learn taskspecific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks.
Multitask learning for classification with dirichlet process priors
 Journal of Machine Learning Research
, 2007
"... Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract

Cited by 99 (9 self)
 Add to MetaCart
Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as singletask learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
Multitask Gaussian Process Prediction
"... In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies ..."
Abstract

Cited by 81 (3 self)
 Add to MetaCart
In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noisefree observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of intertask transfer occurs. We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1
A new approach to collaborative filtering: Operator estimation with spectral regularization
 Journal of Machine Learning Research
"... We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our p ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularizationbased CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularizationbased CF approaches—using popular and wellknown kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on lowrank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularizationbased CF methods to incorporate related information about users and objects. Finally, we show that certain multitask learning methods can be also seen as special cases of our proposed approach.
Clustered MultiTask Learning: a Convex Formulation
"... In multitask learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved b ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
In multitask learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multitask learning. We show in simulations on synthetic examples and on the IEDB MHCI binding dataset, that our approach outperforms wellknown convex methods for multitask learning, as well as related nonconvex methods dedicated to the same problem. 1
Hierarchical Bayesian Domain Adaptation
"... Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more pri ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more principled, better performing model for this problem, based on the use of a hierarchical Bayesian prior. Each domain has its own domainspecific parameter for each feature but, rather than a constant prior over these parameters, the model instead links them via a hierarchical Bayesian global prior. This prior encourages the features to have similar weights across domains, unless there is good evidence to the contrary. We show that the method of (Daumé III, 2007), which was presented as a simple “preprocessing step, ” is actually equivalent, except our representation explicitly separates hyperparameters which were tied in his work. We demonstrate that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daumé III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser. 1
Learning a metalevel prior for feature relevance from multiple related tasks
 In Proceedings of International Conference on Machine Learning (ICML). Einat
, 2007
"... In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — to construct an informative prior on feature relevance. We assume that features themselves have metafeatures that are predictive of their relevance to the prediction task, and model their relevance as a function of the metafeatures using hyperparameters (called metapriors). We present a convex optimization algorithm for simultaneously learning the metapriors and feature weights from an ensemble of related prediction tasks that share a similar relevance structure. Our approach transfers the metapriors among different tasks, allowing it to deal with settings where tasks have nonoverlapping features or where feature relevance varies over the tasks. We show that transfer learning of feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence. 1.
Domain Adaptation from Multiple Sources via Auxiliary Classifiers
"... We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxili ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxiliary/source classifiers) independently learned with the labeled patterns from multiple source domains. We introduce a new datadependent regularizer based on smoothness assumption into LeastSquares SVM (LSSVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain. In addition, we employ a sparsity regularizer to learn a sparse target classifier. Comprehensive experiments on the challenging TRECVID 2005 corpus demonstrate that DAM outperforms the existing multiple source domain adaptation methods for video concept detection in terms of effectiveness and efficiency. 1.