Results 1 - 10
of
12
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as co-variate shift. We also explore some potential future issues in transfer learning research.
Predictive Distribution Matching SVM for Multi-Domain Learning
"... Abstract. Domain adaptation (DA) using labeled data from related source domains comes in handy when the labeled patterns of a target domain are scarce. Nevertheless, it is worth noting that when the predictive distribution P (y|x) of the domains differs, which establishes Negative Transfer [19], DA ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Domain adaptation (DA) using labeled data from related source domains comes in handy when the labeled patterns of a target domain are scarce. Nevertheless, it is worth noting that when the predictive distribution P (y|x) of the domains differs, which establishes Negative Transfer [19], DA approaches generally fail to perform well. Taking this cue, the Predictive Distribution Matching SVM (PDM-SVM) is proposed to learn a robust classifier in the target domain (referred to as the target classifier) by leveraging the labeled data from only the relevant regions of multiple sources. In particular, a k-nearest neighbor graph is iteratively constructed to identify the regions of relevant source labeled data where the predictive distribution maximally aligns with that of the target data. Predictive distribution matching regularization is then introduced to leverage these relevant source labeled data for training the target classifier. In addition, progressive transduction is adopted to infer the label of target unlabeled data for estimating the predictive distribution of the target domain. Finally, extensive experiments are conducted to illustrate the impact of Negative Transfer on several existing state-of-the-art DA methods, and demonstrate the improved performance efficacy of our proposed PDM-SVM on the commonly used multi-domain Sentiment and Reuters datasets.
Transfer Learning in Collaborative Filtering with Uncertain Ratings
"... To solve the sparsity problem in collaborative filtering, researchers have introduced transfer learning as a viable approach to make use of auxiliary data. Most previous transfer learning works in collaborative filtering have focused on exploiting point-wise ratings such as numerical ratings, stars, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
To solve the sparsity problem in collaborative filtering, researchers have introduced transfer learning as a viable approach to make use of auxiliary data. Most previous transfer learning works in collaborative filtering have focused on exploiting point-wise ratings such as numerical ratings, stars, or binary ratings of likes/dislikes. However, in many real-world recommender systems, many users may be unwilling or unlikely to rate items with precision. In contrast, practitioners can turn to various non-preference data to estimate a range or rating distribution of a user’s preference on an item. Such a range or rating distribution is called an uncertain rating since it represents a rating spectrum of uncertainty instead of an accurate point-wise score. In this paper, we propose an efficient transfer learning solution for collaborative filtering, known as transfer by integrative factorization (TIF), to leverage such auxiliary uncertain ratings to improve the performance of recommendation. In particular, we integrate auxiliary data of uncertain ratings as additional constraints in the target matrix factorization problem, and learn an expected rating value for each uncertain rating automatically. The advantages of our proposed approach include the efficiency and the improved effectiveness of collaborative filtering, showing that incorporating the auxiliary data of uncertain ratings can really bring a benefit. Experimental results on two movie recommendation tasks show that our TIF algorithm performs significantly better over a state-of-the-art non-transfer learning method.
Domain Adaptation from Multiple Sources: A Domain-Dependent Regularization Approach
"... Abstract — In this paper, we propose a new framework called domain adaptation machine (DAM) for the multiple source domain adaption problem. Under this framework, we learn a robust decision function (referred to as target classifier) forlabel prediction of instances from the target domain by leverag ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — In this paper, we propose a new framework called domain adaptation machine (DAM) for the multiple source domain adaption problem. Under this framework, we learn a robust decision function (referred to as target classifier) forlabel prediction of instances from the target domain by leveraging a set of base classifiers which are prelearned by using labeled instances either from the source domains or from the source domains and the target domain. With the base classifiers, we propose a new domain-dependent regularizer based on smoothness assumption, which enforces that the target classifier shares similar decision values with the relevant base classifiers on the unlabeled instances from the target domain. This newly proposed regularizer can be readily incorporated into many kernel methods (e.g., support vector machines (SVM), support vector regression, and least-squares SVM (LS-SVM)). For domain adaptation, we also develop two new domain adaptation methods referred to as FastDAM and UniverDAM. In FastDAM, we introduce our proposed domaindependent regularizer into LS-SVM as well as employ a sparsity regularizer to learn a sparse target classifier with the support vectors only from the target domain, which thus makes the label prediction on any test instance very fast. In UniverDAM, we additionally make use of the instances from the source domains as Universum to further enhance the generalization ability of the target classifier. We evaluate our two methods on the challenging TRECIVD 2005 dataset for the large-scale video concept detection task as well as on the 20 newsgroups and email spam datasets for document retrieval. Comprehensive experiments demonstrate that FastDAM and UniverDAM outperform the existing multiple source domain adaptation methods for the two applications. Index Terms — Domain adaptation machine, domain-dependent regularizer, multiple source domain adaptation.
TABLE OF CONTENTS
"... Maclin, to my committee members, and to the Machine Learning Group at the University of Wisconsin-Madison. ..."
Abstract
- Add to MetaCart
Maclin, to my committee members, and to the Machine Learning Group at the University of Wisconsin-Madison.
Location and Scatter Matching for Dataset Shift in Text Mining
"... Abstract—Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed as exploiting only the first-order statistics, namely, the empirical mean discrepancy to evaluate the distribut ..."
Abstract
- Add to MetaCart
Abstract—Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed as exploiting only the first-order statistics, namely, the empirical mean discrepancy to evaluate the distribution gap. Intuitively, considering only the empirical mean may not be statistically efficient. In this paper, we propose a nonparametric distance metric with a good property which jointly considers the empirical mean (Location) and sample covariance (Scatter) difference. More specifically, we propose an improved symmetric Stein’s loss function which combines the mean and covariance discrepancy into a unified Bregman matrix divergence of which Jensen-Shannon divergence between normal distributions is a particular case. Our target is to find a good feature representation which can reduce the distribution gap between different domains, at the same time, ensure that the new derived representation can encode most discriminative components with respect to the label information. We have conducted extensive experiments on several document classification datasets to demonstrate the effectiveness of our proposed method. Keywords-Domain Adaptation, Feature Extraction I.
Query Weighting for Ranking Model Adaptation
"... We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divi ..."
Abstract
- Add to MetaCart
We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods. 1
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Domain Adaptation from Multiple Sources: A Domain-
"... Abstract — In this paper, we propose a new framework called domain adaptation machine (DAM) for the multiple source domain adaption problem. Under this framework, we learn a robust decision function (referred to as target classifier) forlabel prediction of instances from the target domain by leverag ..."
Abstract
- Add to MetaCart
Abstract — In this paper, we propose a new framework called domain adaptation machine (DAM) for the multiple source domain adaption problem. Under this framework, we learn a robust decision function (referred to as target classifier) forlabel prediction of instances from the target domain by leveraging a set of base classifiers which are prelearned by using labeled instances either from the source domains or from the source domains and the target domain. With the base classifiers, we propose a new domain-dependent regularizer based on smoothness assumption, which enforces that the target classifier shares similar decision values with the relevant base classifiers on the unlabeled instances from the target domain. This newly proposed regularizer can be readily incorporated into many kernel methods (e.g., support vector machines (SVM), support vector regression, and least-squares SVM (LS-SVM)). For domain adaptation, we also develop two new domain adaptation methods referred to as FastDAM and UniverDAM. In FastDAM, we introduce our proposed domaindependent regularizer into LS-SVM as well as employ a sparsity regularizer to learn a sparse target classifier with the support vectors only from the target domain, which thus makes the label prediction on any test instance very fast. In UniverDAM, we additionally make use of the instances from the source domains as Universum to further enhance the generalization ability of the target classifier. We evaluate our two methods on the challenging TRECIVD 2005 dataset for the large-scale video concept detection task as well as on the 20 newsgroups and email spam datasets for document retrieval. Comprehensive experiments demonstrate that FastDAM and UniverDAM outperform the existing multiple source domain adaptation methods for the two applications. Index Terms — Domain adaptation machine, domain-dependent regularizer, multiple source domain adaptation.
A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
"... Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions ..."
Abstract
- Add to MetaCart
Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
Gaussian Process for Dimensionality Reduction in Transfer Learning ∗
"... Dimensionality reduction has been considered as one of the most significant tools for data analysis. In general, supervised information is helpful for dimensionality reduction. However, in typical real applications, supervised information in multiple source tasks may be available, while the data of ..."
Abstract
- Add to MetaCart
Dimensionality reduction has been considered as one of the most significant tools for data analysis. In general, supervised information is helpful for dimensionality reduction. However, in typical real applications, supervised information in multiple source tasks may be available, while the data of the target task are unlabeled. An interesting problem of how to guide the dimensionality reduction for the unlabeled target data by exploiting useful knowledge, such as label information, from multiple source tasks arises in such a scenario. In this paper, we propose a new method for dimensionality reduction in the transfer learning setting. Unlike traditional paradigms where the useful knowledge from multiple source tasks is transferred through distance metric, our proposal firstly converts the dimensionality reduction problem into integral regression problems in parallel. Gaussian process is then employed to learn the underlying relationship between the original data and the reduced data. Such a relationship can be appropriately transferred to the target task by exploiting the prediction ability of the Gaussian process model and inventing different kinds of regularizers. Extensive experiments on both synthetic and real data sets show the effectiveness of our method.

