Results 1  10
of
34
Regularized multitask learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract

Cited by 267 (2 self)
 Add to MetaCart
(Show Context)
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
Learning Multiple Tasks with Kernel Methods
 Journal of Machine Learning Research
, 2005
"... Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis sh ..."
Abstract

Cited by 248 (10 self)
 Add to MetaCart
(Show Context)
Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multitask kernel functions we define is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Specific kernels that can be used for multitask learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multitask learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can significantly outperform standard singletask learning particularly when there are many related tasks but few data per task.
Task clustering and gating for Bayesian multitask learning
 Journal of Machine Learning Research
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 156 (2 self)
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Learning Multiple Related Tasks Using Latent Independent Component Analysis
, 2005
"... We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
(Show Context)
We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, our model enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multilabel text classification data sets show that the proposed approach is promising.
Learning a metalevel prior for feature relevance from multiple related tasks
 IN PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML). EINAT
, 2007
"... In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — to construct an informative prior on feature relevance. We assume that features themselves have metafeatures that are predictive of their relevance to the prediction task, and model their relevance as a function of the metafeatures using hyperparameters (called metapriors). We present a convex optimization algorithm for simultaneously learning the metapriors and feature weights from an ensemble of related prediction tasks that share a similar relevance structure. Our approach transfers the metapriors among different tasks, allowing it to deal with settings where tasks have nonoverlapping features or where feature relevance varies over the tasks. We show that transfer learning of feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence.
Flexible Latent Variable Models for MultiTask Learning
"... Summary. Given multiple prediction problems such as regression and classification, we are interested in a joint inference framework which can effectively borrow information among tasks to improve the prediction accuracy, especially when the number of training examples per problem is small. In this p ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Summary. Given multiple prediction problems such as regression and classification, we are interested in a joint inference framework which can effectively borrow information among tasks to improve the prediction accuracy, especially when the number of training examples per problem is small. In this paper we propose a probabilistic framework which can support a set of latent variable models for different multitask learning scenarios. We show that the framework is a generalization of standard learning methods for single prediction problems and it can effectively model the shared structure among different prediction tasks. Furthermore, we present efficient algorithms for the empirical Bayes method as well as point estimation. Our experiments on both simulated datasets and real world classification datasets show the effectiveness of the proposed models in two evaluation settings: standard multitask learning setting and transfer learning setting. Key words: multitask learning, latent variable models, hierarchical Bayesian models, model selection, transfer learning 1
An improved multitask learning approach with applications in medical diagnosis
 In European Conference on Machine Learning
, 2008
"... Abstract. We propose a family of multitask learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinicallyrelated abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a family of multitask learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinicallyrelated abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for each of the tasks. A probabilistic model is derived to justify the proposed learning formulations. By equivalence proof, some existing regularizationbased methods can also be interpreted by our probabilistic model as imposing a Wishart hyperprior. Convergence analysis highlights the conditions under which the formulations achieve convexity and global convergence. Two realworld medical problems: lung cancer prognosis and heart wall motion analysis, are used to validate the proposed algorithms. 1
Multitask sparsity via maximum entropy discrimination
 Journal of Machine Learning Research
, 2009
"... A multitask learning framework is developed for discriminative classification and regression. Largemargin linear classifiers are estimated for different prediction problems. These classifiers operate in a common input space but are coupled as they recover an unknown shared representation. A maximum ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
A multitask learning framework is developed for discriminative classification and regression. Largemargin linear classifiers are estimated for different prediction problems. These classifiers operate in a common input space but are coupled as they recover an unknown shared representation. A maximum entropy discrimination framework is used to derive the multitask algorithm which involves only convex optimization problems that are straightforward to implement. Three multitask scenarios are described. The first multitask method produces multiple support vector machines that learn a shared sparse feature selection over the input space. The second multitask method produces multiple support vector machines that learn a shared conic kernel combination. The third multitask method produces a pooled classifier as well as adaptively specialized individual classifiers. Furthermore, extensions to regression, graphical model structure estimation and other sparse methods are discussed. The maximum entropy optimization problems are implemented via a sequential quadratic programming method which leverages recent progress in fast SVM solvers. Fast monotonic convergence bounds are provided by bounding the MED sparsifying cost function with a quadratic and ensuring only a constant factor runtime increase above standard independent SVM solvers. Results are shown on multitask datasets and favor multitask learning over singletask or tabula rasa methods.
Infinite Predictor Subspace Models for Multitask Learning
"... Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with ℓ1 regularization and LASSOstyle models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a nonlinear manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks. 1