## Domain Adaptation via Transfer Component Analysis

### Cached

### Download Links

- [ijcai.org]
- [www3.ntu.edu.sg]
- [www.cs.ust.hk]
- [www.ntu.edu.sg]
- [c2inet.sce.ntu.edu.sg]
- [c2inet.sce.ntu.edu.sg]
- [www.cse.ust.hk]
- [www.cs.ust.hk]
- [www1.i2r.a-star.edu.sg]
- [c2inet.sce.ntu.edu.sg]
- [c2inet.sce.ntu.edu.sg]
- [www.cse.ust.hk]
- [www.cs.ust.hk]
- [www.cse.ust.hk]
- [www.cs.ust.hk]
- [www.cse.ust.hk]
- [www1.i2r.a-star.edu.sg]
- [www.cse.ust.hk]
- [www.cse.ust.hk]
- [www.cs.ust.hk]

Citations: | 39 - 16 self |

### BibTeX

@MISC{Pan_domainadaptation,

author = {Sinno Jialin Pan and Ivorw. Tsang and James T. Kwok and Qiang Yang},

title = {Domain Adaptation via Transfer Component Analysis},

year = {}

}

### OpenURL

### Abstract

Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components, data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components. Furthermore, our approach can handle large datsets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification. 1

### Citations

1048 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...ng method which utilizes an explicit low-rank representation. First, note that the kernel matrix K in (2) can be decomposed as K = (KK−1/2)(K −1/2K ), which is often known as the empirical kernel map =-=[27]-=-. Consider the use of a matrix ˜W ∈ R (n1+n2)×m that transforms the empirical kernel map features to an m-dimensional space (where m ≪ n1 + n2). The resultant kernel matrix is then ˜K = (KK −1/2 ˜W )(... |

803 | Text classification from labeled and unlabeled documents using - Nigam, McCallum, et al. - 2000 |

438 | Newsweeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...gs of the parameters. C. Cross-Domain Text Classification 1) Experimental Setup: In this section, we perform crossdomain text classification experiments on a preprocessed dataset of the 20-Newsgroups =-=[36]-=-. In this experiment, we follow the preprocessing strategy in [37] to create six datasets from this collection. For each dataset, two top categories are chosen, one as positive and the other as negati... |

375 | An Introduction to Kernel-Based Learning Algorithms
- Muller, Mika, et al.
- 2001
(Show Context)
Citation Context ...nto (8), we obtain minW tr((W ⊤KHKW) † W ⊤ (KLK + μI)W). Since the matrix KLK+μI is nonsingular, we obtain an equivalent trace maximization problem (7). Similar to kernel Fisher discriminant analysis =-=[28]-=-, the W solutions in (7) are the m leading eigenvectors of (KLK + μI) −1KHK,wherem≤n1 + n2 − 1. In the sequel, this will be referred to as TCA. + −PAN et al.: DOMAIN ADAPTATION VIA TRANSFER COMPONENT... |

341 | Interior point polynomial algorithms in convex programming - Nesterov, Nemirovskii - 1994 |

330 | regularization: A geometric framework for learning from labeled and unlabeled examples
- Belkin, Niyogi, et al.
- 2006
(Show Context)
Citation Context ...timization problem. Hence, the resultant SDP will typically have a very large number of constraints. To avoid this problem, we make use of the locality preserving property of the manifold regularizer =-=[32]-=-. First, we construct a graph with the affinity mij = exp(−d 2 ij /2σ 2 ) if xi is one of the k nearest neighbors of x j , or vice versa. Let M = [mij]. The graph Laplacian matrix is L = D − M, where ... |

319 | A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data - Ando, Zhang |

280 |
SemiSupervised Learning
- Chapelle, Schölkopf, et al.
- 2006
(Show Context)
Citation Context ...xists an intrinsic low-dimensional manifold underlying the highdimensional observations. The effective use of manifold information is an important component in many semisupervised learning algorithms =-=[29]-=-. In this section, we extend the unsupervised TCA in Section III-C to the semisupervised learning setting. Motivated by the kernel target alignment [30], [31], a representation that maximizes its depe... |

236 | On kernel-target alignment
- Cristianini, Shawe-Taylor, et al.
- 2001
(Show Context)
Citation Context ...nent in many semisupervised learning algorithms [29]. In this section, we extend the unsupervised TCA in Section III-C to the semisupervised learning setting. Motivated by the kernel target alignment =-=[30]-=-, [31], a representation that maximizes its dependence with the data labels may lead to better generalization performance. Hence, we can maximize the label dependence instead of minimizing the empiric... |

180 | A survey on transfer learning
- Pan, Yang
- 2010
(Show Context)
Citation Context ...device (the target domain). Domain adaptation can be considered as a special setting of transfer learning which aims at transferring shared knowledge across different but related tasks or domains [3]–=-=[5]-=-. A major computational problem in domain adaptation is how to reduce the difference between the distributions of the source and target domain data. Intuitively, discovering a good feature representat... |

160 | On the Influence of the Kernel on the Consistency of Support Vector
- Steinwart
- 2001
(Show Context)
Citation Context ...ring matrix, and n is the number of samples in X and Y . Similar to MMD, it can also be shown that if the RKHS is universal, HSIC asymptotically approaches zero if and only if X and Y are independent =-=[25]-=-. Conversely, a large HSIC value suggests strong dependence. C. Embedding Using HSIC In embedding or dimensionality reduction, it is often desirable to preserve the local data geometry while at the sa... |

158 |
Frustratingly easy domain adaptation
- Daumé
- 2007
(Show Context)
Citation Context ...s, or side information [7]) of the original data, especially for the target domain data. Recently, several approaches have been proposed to learn a common feature representation for domain adaptation =-=[8]-=-–[10]. Daumé III [8] designed a heuristic kernel to augment features for solving some specific domain adaptation problems in natural language processing. Blitzer et al. [9] proposed the structural cor... |

152 | Domain adaptation with structural correspondence learning
- Blitzer, McDonald, et al.
- 2006
(Show Context)
Citation Context ...ation for domain adaptation [8]–[10]. Daumé III [8] designed a heuristic kernel to augment features for solving some specific domain adaptation problems in natural language processing. Blitzer et al. =-=[9]-=- proposed the structural correspondence learning (SCL) algorithm, motivated from [11], to induce correspondences among features from the different domains. This method depends on the heuristic selecti... |

129 | Correcting sample selection bias by unlabeled data
- Huang, Smola, et al.
(Show Context)
Citation Context ... methods is that P ̸= Q, but P(YS|X S) = P(YT |XT ). The problem of covariate shift adaptation is also related to domain adaption. To address this problem, importance reweighting is a major technique =-=[14]-=-–[18]. Huang et al. [14] proposed a kernel-based method, known as kernel mean matching (KMM) to reweight instances in a reproducing kernel Hilbert space (RKHS). Sugiyama et al. [15] proposed another i... |

113 | Learning a kernel matrix for nonlinear dimensionality reduction
- Weinberger, Sha, et al.
(Show Context)
Citation Context ...nted as kernel matrix K yy) is measured by the HSIC criterion. Mathematically, this leads to the following SDP: max K ≽0 tr(HKHKyy) subject to constraints on K . (1) In particular, (1) reduces to MVU =-=[26]-=- when no side information is given (i.e., K yy = I). III. TCA As mentioned in Section II-A, most domain adaptation methods assume that P ̸= Q, butP(YS|XS) = P(YT |XT ). However, in many real-world app... |

109 | Analysis of representations for domain adaptation
- Ben-David, Blitzer, et al.
- 2007
(Show Context)
Citation Context ...domain adaptation is how to reduce the difference between the distributions of the source and target domain data. Intuitively, discovering a good feature representation across domains is crucial [3], =-=[6]-=-. A good feature representation should be able to reduce the difference in distributions between domains as much as possible, while at the same time preserving important properties (such as geometric ... |

106 | bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification - Biographies |

95 | Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory (pp. 63–77
- Gretton, Bousquet, et al.
- 2005
(Show Context)
Citation Context ...at, when the RKHS is universal, MMD will asymptotically approach zero if and only if the two distributions are the same. 2) Hilbert–Schmidt Independence Criterion (HSIC): Related to the MMD, the HSIC =-=[24]-=- is a simple yet powerful nonparametric criterion for measuring the dependence between the sets X and Y . As its name implies, it computes the Hilbert–Schmidt norm of a cross-covariance operator in th... |

66 | A high-performance semi-supervised learning method for text chunking - Ando, Zhang - 2005 |

53 | A hilbert space embedding for distributions
- Smola, Gretton, et al.
- 2007
(Show Context)
Citation Context ... distance. However, many of these estimators are parametric or require an intermediate density estimate. Recently, a nonparametric distance estimate was designed by embedding distributions in an RKHS =-=[22]-=-. Gretton et al. [23] introduced the MMD for comparing distributions based on the corresponding RKHS distance. Let the kernel-induced feature map be φ. The empirical estimate of MMD between {x1,...,xn... |

52 | Integrating structured biological data by kernel maximum mean discrepancy - Borgwardt, Gretton, et al. - 2006 |

44 | Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
- Sugiyama, Nakajima, et al.
- 2008
(Show Context)
Citation Context ... a major technique [14]–[18]. Huang et al. [14] proposed a kernel-based method, known as kernel mean matching (KMM) to reweight instances in a reproducing kernel Hilbert space (RKHS). Sugiyama et al. =-=[15]-=- proposed another importance reweighting algorithm, known as Kullback– Leibler importance estimation procedure (KLIEP), which is integrated with cross validation to perform model selection automatical... |

39 | A kernel method for the two sample problem
- Gretton, Borgwardt, et al.
(Show Context)
Citation Context ...any of these estimators are parametric or require an intermediate density estimate. Recently, a nonparametric distance estimate was designed by embedding distributions in an RKHS [22]. Gretton et al. =-=[23]-=- introduced the MMD for comparing distributions based on the corresponding RKHS distance. Let the kernel-induced feature map be φ. The empirical estimate of MMD between {x1,...,xn1 } and {y1,...,yn2 ∑... |

38 |
Transfer learning via dimensionality reduction
- Pan, Kwok, et al.
- 2008
(Show Context)
Citation Context ...r side information [7]) of the original data, especially for the target domain data. Recently, several approaches have been proposed to learn a common feature representation for domain adaptation [8]–=-=[10]-=-. Daumé III [8] designed a heuristic kernel to augment features for solving some specific domain adaptation problems in natural language processing. Blitzer et al. [9] proposed the structural correspo... |

36 | Multi-task learning for HIV therapy screening - Bickel, Bogojeska, et al. - 2008 |

36 |
Co-clustering based Classification for Out-of-domain Documents
- Dai, Xue
(Show Context)
Citation Context ...imental Setup: In this section, we perform crossdomain text classification experiments on a preprocessed dataset of the 20-Newsgroups [36]. In this experiment, we follow the preprocessing strategy in =-=[37]-=- to create six datasets from this collection. For each dataset, two top categories are chosen, one as positive and the other as negative. We then split the data based on subcategories. Different subca... |

34 | A least-squares approach to direct importance estimation
- Kanamori, Hido, et al.
- 2008
(Show Context)
Citation Context ...ted with cross validation to perform model selection automatically. Bickel et al. [16] proposed to integrate the distribution correcting process into a kernelized logistic regression. Kanamori et al. =-=[17]-=- proposed a method called unconstrained least-squares importance fitting (uLSIF) to estimate the importance efficiently by formulating the direct importance estimation problem as a least-squares funct... |

28 |
Optimizing the kernel in the empirical feature space
- Xiong, Swamy, et al.
- 2005
(Show Context)
Citation Context ...n many semisupervised learning algorithms [29]. In this section, we extend the unsupervised TCA in Section III-C to the semisupervised learning setting. Motivated by the kernel target alignment [30], =-=[31]-=-, a representation that maximizes its dependence with the data labels may lead to better generalization performance. Hence, we can maximize the label dependence instead of minimizing the empirical err... |

25 | Colored maximum variance unfolding
- Song, Smola, et al.
- 2008
(Show Context)
Citation Context ... the difference in distributions between domains as much as possible, while at the same time preserving important properties (such as geometric properties, statistical properties, or side information =-=[7]-=-) of the original data, especially for the target domain data. Recently, several approaches have been proposed to learn a common feature representation for domain adaptation [8]–[10]. Daumé III [8] de... |

25 | Discriminative learning under covariate shift
- Bickel, Brückner, et al.
- 2009
(Show Context)
Citation Context ...importance reweighting algorithm, known as Kullback– Leibler importance estimation procedure (KLIEP), which is integrated with cross validation to perform model selection automatically. Bickel et al. =-=[16]-=- proposed to integrate the distribution correcting process into a kernelized logistic regression. Kanamori et al. [17] proposed a method called unconstrained least-squares importance fitting (uLSIF) t... |

20 | Implicitly restarted Arnoldi/Lanczos methods for large scale eigenvalue calculations
- Sorensen
- 1994
(Show Context)
Citation Context .... In contrast, our proposed kernel learning method requires only a simple and efficient eigenvalue decomposition. This takes only O(m(n1 + n2) 2 ) time when m nonzero eigenvectors are to be extracted =-=[34]-=-. V. EXPERIMENTS In this section, we first verify the motivations of our proposed methods for domain adaptation on some toy datasets. A. Synthetic Data As discussed in Section II-A, the optimization o... |

14 |
Finding stationary subspaces in multivariate time series
- Bünau, Meinecke, et al.
- 2009
(Show Context)
Citation Context ... may be sensitive to different applications. Most previous feature-based domain adaptation methods do not minimize the distance in distributions between domains explicitly. Recently, von Bünau et al. =-=[12]-=- proposed stationary subspace analysis (SSA) to match distributions in a latent space. However, SSA is focused on the identification of a stationary subspace, without considering the preservation of p... |

11 | Dimensionality reduction for density ratio estimation in high-dimensional spaces
- Sugiyama, Kawanabe, et al.
- 2010
(Show Context)
Citation Context ...ods is that P ̸= Q, but P(YS|X S) = P(YT |XT ). The problem of covariate shift adaptation is also related to domain adaption. To address this problem, importance reweighting is a major technique [14]–=-=[18]-=-. Huang et al. [14] proposed a kernel-based method, known as kernel mean matching (KMM) to reweight instances in a reproducing kernel Hilbert space (RKHS). Sugiyama et al. [15] proposed another import... |

11 | Inlier-based outlier detection via direct density ratio estimation
- Hido, Tsuboi, et al.
- 2008
(Show Context)
Citation Context ...ptation. Note that, besides covariate shift adaptation, importance estimation techniques have also been applied to various applications, such as independent component analysis [19], outlier detection =-=[20]-=-, and change-point detection [21]. Besides reweighting methods, von Bünau et al. [12] proposed to match distributions in a latent space. More specifically, they theoretically studied the conditions un... |

11 |
Change-point detection in time-series data by direct density-ratio estimation
- Kawahara, Sugiyama
- 2009
(Show Context)
Citation Context ...iate shift adaptation, importance estimation techniques have also been applied to various applications, such as independent component analysis [19], outlier detection [20], and change-point detection =-=[21]-=-. Besides reweighting methods, von Bünau et al. [12] proposed to match distributions in a latent space. More specifically, they theoretically studied the conditions under which a stationary space can ... |

10 | Bridged refinement for transfer learning - Xing, Dai, et al. - 2007 |

10 | Estimating location using wi-fi
- Yang, Pan, et al.
- 2008
(Show Context)
Citation Context ... and expensive to obtain [1]. Moreover, once calibrated, these data can be easily outdated because the WiFi signal strength may be a function of many dynamic factors including time, device, and space =-=[2]-=-. To reduce the recalibration effort, we might want to adapt a localization model trained in one time period (the source domain) for a new time period (the target domain), or to adapt the localization... |

8 | Bias learning, knowledge sharing
- Ghosn, Bengio
(Show Context)
Citation Context ...ile device (the target domain). Domain adaptation can be considered as a special setting of transfer learning which aims at transferring shared knowledge across different but related tasks or domains =-=[3]-=-–[5]. A major computational problem in domain adaptation is how to reduce the difference between the distributions of the source and target domain data. Intuitively, discovering a good feature represe... |

7 | Estimating squared-loss mutual information for independent component analysis. Independent Component Analysis and Signal Separation (pp
- Suzuki, Sugiyama
- 2009
(Show Context)
Citation Context ...g a latent space for adaptation. Note that, besides covariate shift adaptation, importance estimation techniques have also been applied to various applications, such as independent component analysis =-=[19]-=-, outlier detection [20], and change-point detection [21]. Besides reweighting methods, von Bünau et al. [12] proposed to match distributions in a latent space. More specifically, they theoretically s... |

5 |
Indoor location system based on discriminant-adaptive neural network in IEEE 802.11 environments
- Fang, Lin
- 1973
(Show Context)
Citation Context ...irable to make the best use of any related data available. For example, in indoor WiFi localization which requires regression learning, the labeled training data are difficult and expensive to obtain =-=[1]-=-. Moreover, once calibrated, these data can be easily outdated because the WiFi signal strength may be a function of many dynamic factors including time, device, and space [2]. To reduce the recalibra... |

5 | A multitask learning model for online pattern recognition - Ozawa, Roy, et al. - 2009 |

5 |
Data visualization and dimensionality reduction using kernel maps with a reference point,” Neural Networks
- Suykens
- 2008
(Show Context)
Citation Context ...een distributions can be reduced while the data variance can be preserved. However, MMDE suffers from two major limitations: 1) MMDE is transductive, and does not generalize to out-of-sample patterns =-=[13]-=-, and 2) MMDE learns the latent space by solving a semidefinite program (SDP), which is computationally expensive. In this paper, we propose a new feature extraction approach, called transfer componen... |

3 | Von Buenau, and Motoaki Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation - Sugiyama, Nakajima, et al. - 2008 |

2 | Sinno Jialin Pan, and Vincent Wenchen Zheng. Estimating location using Wi-Fi - Yang |