Results 1  10
of
247
Bayesian Compressive Sensing
, 2007
"... The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing ..."
Abstract

Cited by 327 (24 self)
 Add to MetaCart
(Show Context)
The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned Ndimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying Ndimensional signal. The number of required compressivesensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying Ndimensional signal, and g a vector of compressivesensing measurements, then one may approximate f accurately by utilizing knowledge of the (underdetermined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressivesensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
Convex multitask feature learning
 MACHINE LEARNING
, 2007
"... We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove th ..."
Abstract

Cited by 250 (25 self)
 Add to MetaCart
(Show Context)
We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks.
Multitask feature learning
 Advances in Neural Information Processing Systems 19
, 2007
"... We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that th ..."
Abstract

Cited by 240 (8 self)
 Add to MetaCart
(Show Context)
We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacrosstasks representations and in the former step we learn taskspecific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks.
Multitask Gaussian Process Prediction
"... In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies ..."
Abstract

Cited by 144 (6 self)
 Add to MetaCart
(Show Context)
In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noisefree observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of intertask transfer occurs. We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1
Multitask learning for classification with dirichlet process priors
 Journal of Machine Learning Research
, 2007
"... Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract

Cited by 139 (11 self)
 Add to MetaCart
Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as singletask learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
A new approach to collaborative filtering: Operator estimation with spectral regularization
 Journal of Machine Learning Research
"... We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our p ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularizationbased CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularizationbased CF approaches—using popular and wellknown kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on lowrank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularizationbased CF methods to incorporate related information about users and objects. Finally, we show that certain multitask learning methods can be also seen as special cases of our proposed approach.
Clustered MultiTask Learning: a Convex Formulation
"... In multitask learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved b ..."
Abstract

Cited by 81 (6 self)
 Add to MetaCart
(Show Context)
In multitask learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multitask learning. We show in simulations on synthetic examples and on the IEDB MHCI binding dataset, that our approach outperforms wellknown convex methods for multitask learning, as well as related nonconvex methods dedicated to the same problem. 1
Domain Adaptation from Multiple Sources via Auxiliary Classifiers
"... We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxili ..."
Abstract

Cited by 59 (11 self)
 Add to MetaCart
(Show Context)
We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxiliary/source classifiers) independently learned with the labeled patterns from multiple source domains. We introduce a new datadependent regularizer based on smoothness assumption into LeastSquares SVM (LSSVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain. In addition, we employ a sparsity regularizer to learn a sparse target classifier. Comprehensive experiments on the challenging TRECVID 2005 corpus demonstrate that DAM outperforms the existing multiple source domain adaptation methods for video concept detection in terms of effectiveness and efficiency. 1.
Learning Multiple Related Tasks Using Latent Independent Component Analysis
, 2005
"... We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
(Show Context)
We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, our model enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multilabel text classification data sets show that the proposed approach is promising.
Hierarchical Bayesian Domain Adaptation
"... Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more pri ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more principled, better performing model for this problem, based on the use of a hierarchical Bayesian prior. Each domain has its own domainspecific parameter for each feature but, rather than a constant prior over these parameters, the model instead links them via a hierarchical Bayesian global prior. This prior encourages the features to have similar weights across domains, unless there is good evidence to the contrary. We show that the method of (Daumé III, 2007), which was presented as a simple “preprocessing step, ” is actually equivalent, except our representation explicitly separates hyperparameters which were tied in his work. We demonstrate that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daumé III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser. 1