Results 1 - 10
of
20
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract
-
Cited by 59 (8 self)
- Add to MetaCart
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as co-variate shift. We also explore some potential future issues in transfer learning research.
Exploiting feature hierarchy for transfer learning in named entity recognition
- In ACL:HLT ’08
, 2008
"... We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used t ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In the subproblem of domain adaptation, a model trained over a source domain is generalized to perform well on a related target domain, where the two domains’ data are distributed similarly, but not identically. We introduce the concept of groups of closely-related domains, called genres, and show how inter-genre adaptation is related to domain adaptation. We also examine multitask learning, where two domains may be related, but where the concept to be learned in each case is distinct. We show that our prior conveys useful information across domains, genres and tasks, while remaining robust to spurious signals not related to the target domain and concept. We further show that our model generalizes a class of similar hierarchical priors, smoothed to varying degrees, and lay the groundwork for future exploration in this area. 1
Learning to select features using their properties
, 2006
"... Feature selection is the task of choosing a small subset of features that is sufficient to predict the target labels well. Here, instead of trying to directly determine which features are better, we attempt to learn the properties of good features. For this purpose we assume that each feature is rep ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Feature selection is the task of choosing a small subset of features that is sufficient to predict the target labels well. Here, instead of trying to directly determine which features are better, we attempt to learn the properties of good features. For this purpose we assume that each feature is represented by a set of properties, referred to as meta-features. This approach enables prediction of the quality of features without measuring their value on the training instances. We use this ability to devise new selection algorithms that can efficiently search for new good features in the presence of a huge number of features, and to dramatically reduce the number of feature measurements needed. We demonstrate our algorithms on a handwritten digit recognition problem and a visual object category recognition problem. In addition, we show how this novel viewpoint enables derivation of better generalization bounds for the joint learning problem of selection and classification, and how it contributes to a better understanding of the problem. Specifically, in the context of object recognition, previous works showed that it is possible to find one set of features which fits most object categories (aka a universal dictionary). Here we use our framework to analyze one such universal dictionary and find that the quality of features in this dictionary can be predicted accurately by its meta-features.
Adaptive Multi-Task Lasso: with application to eQTL detection
"... To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci i ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs. 1
Learning with Whom to Share in Multi-task Feature Learning
"... In multi-task learning (MTL), multiple tasks are learnt jointly. A major assumption for this paradigm is that all those tasks are indeed related so that the joint training is appropriate and beneficial. In this paper, we study the problem of multi-task learning of shared feature representations amon ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In multi-task learning (MTL), multiple tasks are learnt jointly. A major assumption for this paradigm is that all those tasks are indeed related so that the joint training is appropriate and beneficial. In this paper, we study the problem of multi-task learning of shared feature representations among tasks, while simultaneously determining “with whom ” each task should share. We formulatetheproblemasamixedintegerprogramming and provide an alternating minimization technique to solve the optimization problem of jointly identifying grouping structures and parameters. The algorithm monotonicallydecreasesthe objectivefunction and converges to a local optimum. Compared to the standard MTL paradigm where all tasks are in a single group, our algorithm improves its performance with statistical significance for three out of the four datasets we have studied. We also demonstrate its advantage over other task grouping techniques investigated in literature. 1.
Stacked gaussian process learning
- Proceedings of the 9th IEEE International Conference on Data Mining (ICDM–09
"... Abstract—Triggered by a market relevant application that involves making joint predictions of pedestrian and public transit flows in urban areas, we address the question of how to utilize hidden common cause relations among variables of interest in order to improve performance in the two related reg ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—Triggered by a market relevant application that involves making joint predictions of pedestrian and public transit flows in urban areas, we address the question of how to utilize hidden common cause relations among variables of interest in order to improve performance in the two related regression tasks. Specifically, we propose stacked Gaussian process learning, a meta-learning scheme in which a base Gaussian process is enhanced by adding the posterior covariance functions of other related tasks to its covariance function in a stage-wise optimization. The idea is that the stacked posterior covariances encode the hidden common causes among variables of interest that are shared across the related regression tasks. Stacked Gaussian process learning is efficient, capable of capturing shared common causes, and can be implemented with any kind of standard Gaussian process regression model such as sparse approximations and relational variants. Our experimental results on real-world data from the market relevant application show that stacked Gaussian processes learning can significantly improve prediction performance of a standard Gaussian process.
Graph-based Transfer Learning
"... Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. In this paper, we prop ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. In this paper, we propose a graph-based transfer learning framework. It propagates the label information from the source domain to the target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. Our framework is semi-supervised and nonparametric in nature and thus more flexible. We also develop an iterative algorithm so that our framework is scalable to large-scale applications. It enjoys the theoretical property of convergence. Compared with existing transfer learning methods, the proposed framework propagates the label information to both the features irrelevant to the source domain and the unlabeled examples in the target domain via the common features in a principled way. Experimental results on 3 real data sets demonstrate the effectiveness of our algorithm.
Exclusive Lasso for Multi-task Feature Selection
"... We propose a novel group regularization which we call exclusive lasso. Unlike the group lasso regularizer that assumes covarying variables in groups, the proposed exclusive lasso regularizer models the scenario when variables in the same group compete with each other. Analysis is presented to illust ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose a novel group regularization which we call exclusive lasso. Unlike the group lasso regularizer that assumes covarying variables in groups, the proposed exclusive lasso regularizer models the scenario when variables in the same group compete with each other. Analysis is presented to illustrate the properties of the proposed regularizer. We present a framework of kernel based multi-task feature selection algorithm basedontheproposedexclusivelassoregularizer. An efficient algorithmis derivedtosolve the related optimization problem. Experiments with document categorization show that our approach outperforms state-of-theart algorithms for multi-task feature selection. 1
Adaptive Transfer Learning
"... Transfer learning aims at reusing the knowledge in some source tasks to improve the learning of a target task. Many transfer learning methods assume that the source tasks and the target task be related, even though many tasks are not related in reality. However, when two tasks are unrelated, the kno ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Transfer learning aims at reusing the knowledge in some source tasks to improve the learning of a target task. Many transfer learning methods assume that the source tasks and the target task be related, even though many tasks are not related in reality. However, when two tasks are unrelated, the knowledge extracted from a source task may not help, and even hurt, the performance of a target task. Thus, how to avoid negative transfer and then ensure a “safe transfer ” of knowledge is crucial in transfer learning. In this paper, we propose an Adaptive Transfer learning algorithm based on Gaussian Processes (AT-GP), which can be used to adapt the transfer learning schemes by automatically estimating the similarity between a source and a target task. The main contribution of our work is that we propose a new semi-parametric transfer kernel for transfer learning from a Bayesian perspective, and propose to learn the model with respect to the target task, rather than all tasks as in multi-task learning. We can formulate the transfer learning problem as a unified Gaussian Process (GP) model. The adaptive transfer ability of our approach is verified on both synthetic and real-world datasets.

