Results 1  10
of
61
Correcting sample selection bias by unlabeled data
"... We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We prese ..."
Abstract

Cited by 207 (11 self)
 Add to MetaCart
(Show Context)
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract

Cited by 122 (55 self)
 Add to MetaCart
(Show Context)
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the braincomputer interface, where strong nonstationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and KlausRobert Müller.
A leastsquares approach to direct importance estimation
 Journal of Machine Learning Research
, 2009
"... We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estima ..."
Abstract

Cited by 78 (44 self)
 Add to MetaCart
(Show Context)
We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closedform solution; the leaveoneout crossvalidation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches.
Direct importance estimation for covariate shift adaptation
 Annals of the Institute of Statistical Mathematics
"... A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities a ..."
Abstract

Cited by 68 (50 self)
 Add to MetaCart
A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance, is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Furthermore, we give rigorous mathematical proofs for the convergence
Domain adaptation problems: A DASVM classification technique and a circular validation strategy
 PAMI
"... ABSTRACT—This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main nove ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
(Show Context)
ABSTRACT—This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main novel contributions are proposed: 1) a domain adaptation support vector machine (DASVM) technique which extends the formulation of support vector machines (SVMs) to the domain adaptation framework and 2) a circular indirect accuracy assessment strategy for validating the learning of domain adaptation classifiers when no true labels for the targetdomain instances are available. Experimental results, obtained on a series of twodimensional toy problems and on two real data sets related to brain computer interface and remote sensing applications, confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy. Index Terms—Domain adaptation, transfer learning, semisupervised learning, support vector machines, accuracy assessment, validation strategy. Ç 1
Discriminative learning under covariate shift
 The Journal of Machine Learning Research
"... We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither tr ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection.
Direct Density Ratio Estimation for Largescale Covariate Shift Adaptation
"... Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importa ..."
Abstract

Cited by 37 (21 self)
 Add to MetaCart
(Show Context)
Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importance, which is the ratio of test and training densities. We propose a novel method that allows us to directly estimate the importance from samples without going through the hard task of density estimation. An advantage of the proposed method is that the computation time is nearly independent of the number of test input samples, which is highly beneficial in recent applications with large numbers of unlabeled samples. We demonstrate through experiments that the proposed method is computationally more efficient than existing approaches with comparable accuracy.
Covariate Shift by Kernel Mean Matching
"... Given sets of observations of training and test data, we consider the problem of reweighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Given sets of observations of training and test data, we consider the problem of reweighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. We provide a uniform convergence bound on the distance between the reweighted training feature mean and the test feature mean, a transductive bound on the expected loss of an algorithm trained on the reweighted data, and a connection to single class SVMs. While our method is designed to deal with the case of simple covariate shift (in the sense of Chapter??), we have also found benefits for sample selection bias on the labels. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is “simpler” than the data might suggest.
Machine Learning for RealTime SingleTrial EEG Analysis: From BrainComputer Interfacing to Mental State Monitoring
, 2007
"... Machine learning methods are an excellent choice for compensating the high variability in EEG when analyzing single trial data in realtime. This paper briefly reviews preprocessing and classification techniques for efficient EEGbased BrainComputer Interfacing (BCI) and mental state monitoring app ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Machine learning methods are an excellent choice for compensating the high variability in EEG when analyzing single trial data in realtime. This paper briefly reviews preprocessing and classification techniques for efficient EEGbased BrainComputer Interfacing (BCI) and mental state monitoring applications. More specifically, this paper gives an outline of the Berlin BrainComputer Interface (BBCI), which can be operated with minimal subject training. Also, spelling with the novel BBCIbased HexoSpell text entry system, which gains communication speeds of 68 letters per minute, is discussed. Finally the results of a realtime arousal monitoring experiment are presented.
Relative DensityRatio Estimation for Robust Distribution Comparison
"... Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and tw ..."
Abstract

Cited by 27 (18 self)
 Add to MetaCart
(Show Context)
Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and twosample homogeneity test. However, since densityratio functions often possess high fluctuation, divergence estimation is still a challenging task in practice. In this paper, we propose to use relative divergences for distribution comparison, which involves approximation of relative densityratios. Since relative densityratios are always smoother than corresponding ordinary densityratios, our proposed method is favorable in terms of the nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach. 1