Results 1  10
of
67
Regularized multitask learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract

Cited by 151 (1 self)
 Add to MetaCart
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
A Model of Inductive Bias Learning
 Journal of Artificial Intelligence Research
, 2000
"... A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonablysized training sets. Typically such bias is suppl ..."
Abstract

Cited by 143 (0 self)
 Add to MetaCart
A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonablysized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Exp...
Constructing informative priors using transfer learning
 In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorit ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other “similar ” learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior. 1.
Transfer Learning for Image Classification with Sparse Prototype Representations
"... To learn a new visual category from few examples, prior knowledge from unlabeled data as well as previous related categories may be useful. We develop a new method for transfer learning which exploits available unlabeled data and an arbitrary kernel function; we form a representation based on kernel ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
To learn a new visual category from few examples, prior knowledge from unlabeled data as well as previous related categories may be useful. We develop a new method for transfer learning which exploits available unlabeled data and an arbitrary kernel function; we form a representation based on kernel distances to a large set of unlabeled data points. To transfer knowledge from previous related problems we observe that a category might be learnable using only a small subset of reference prototypes. Related problems may share a significant number of relevant prototypes; we find such a concise representation by performing a joint loss minimization over the training sets of related problems with a shared regularization penalty that minimizes the total number of prototypes involved in the approximation. This optimization problem can be formulated as a linear program that can be solved efficiently. We conduct experiments on a newstopic prediction task where the goal is to predict whether an image belongs to a particular news topic. Our results show that when only few examples are available for training a target topic, leveraging knowledge learnt from other topics can significantly improve performance.
Active Learning with Multiple Views
, 2002
"... Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which i ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce CoTesting, which is the first approach to multiview active learning. Second, we extend the multiview learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that CoTesting outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing. 1.
Hierarchical Bayesian Domain Adaptation
"... Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more pri ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
Multitask learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more principled, better performing model for this problem, based on the use of a hierarchical Bayesian prior. Each domain has its own domainspecific parameter for each feature but, rather than a constant prior over these parameters, the model instead links them via a hierarchical Bayesian global prior. This prior encourages the features to have similar weights across domains, unless there is good evidence to the contrary. We show that the method of (Daumé III, 2007), which was presented as a simple “preprocessing step, ” is actually equivalent, except our representation explicitly separates hyperparameters which were tied in his work. We demonstrate that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daumé III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser. 1
Learning a metalevel prior for feature relevance from multiple related tasks
 In Proceedings of International Conference on Machine Learning (ICML). Einat
, 2007
"... In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — to construct an informative prior on feature relevance. We assume that features themselves have metafeatures that are predictive of their relevance to the prediction task, and model their relevance as a function of the metafeatures using hyperparameters (called metapriors). We present a convex optimization algorithm for simultaneously learning the metapriors and feature weights from an ensemble of related prediction tasks that share a similar relevance structure. Our approach transfers the metapriors among different tasks, allowing it to deal with settings where tasks have nonoverlapping features or where feature relevance varies over the tasks. We show that transfer learning of feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence. 1.
Machine Learning Techniques for the Computer Security Domain of Anomaly Detection
, 2000
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1 ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1
Learning visual representations using images with captions
 In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2007
"... Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that i ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is lowdimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fullysupervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of metadata, but more generally other types of metadata could be used. 1.
Empirical Bayes for Learning to Learn
 Proceedings of ICML
, 2000
"... We present a new model for studying multitask learning, linking theoretical results to practical simulations. In our model all tasks are combined in a single feedforward neural network. Learning is implemented in a Bayesian fashion. In this Bayesian framework the hiddentooutput weights, bein ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We present a new model for studying multitask learning, linking theoretical results to practical simulations. In our model all tasks are combined in a single feedforward neural network. Learning is implemented in a Bayesian fashion. In this Bayesian framework the hiddentooutput weights, being specific to each task, play the role of model parameters. The inputtohidden weights, which are shared between all tasks, are treated as hyperparameters. Other hyperparameters describe error variance and correlations and priors for the model parameters. An important feature of our model is that the probability of these hyperparameters given the data can be computed explicitely and only depends on a set of sufficient statistics. None of these statistics scales with the number of tasks or patterns, which makes empirical Bayes for multitask learning a relatively straightforward optimization problem. Simulations on realworld data sets on singlecopy newspaper and magazine sal...