Results 1  10
of
69
Direct importance estimation with model selection and its application to covariate shift adaptation
 In NIPS
, 2008
"... A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities ar ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance, is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Simulations illustrate the usefulness of our approach. 1
A leastsquares approach to direct importance estimation
 Journal of Machine Learning Research
, 2009
"... We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estima ..."
Abstract

Cited by 36 (24 self)
 Add to MetaCart
We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closedform solution; the leaveoneout crossvalidation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches.
Singletrial analysis and classification of ERP components – a tutorial. NeuroImage
"... Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trialtotrial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehen ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trialtotrial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehensive framework for decoding ERPs, elaborating on linear concepts, namely spatiotemporal patterns and filters as well as linear ERP classification. However, the bottleneck of these techniques is that they require an accurate covariance matrix estimation in high dimensional sensor spaces which is a highly intricate problem. As a remedy, we propose to use shrinkage estimators and show that appropriate regularization of linear discriminant analysis (LDA) by shrinkage yields excellent results for singletrial ERP classification that are far superior to classical LDA classification. Furthermore, we give practical hints on the interpretation of what classifiers learned from the data and demonstrate in particular that the tradeoff between goodnessoffit and model complexity in regularized LDA relates to a morphing between a difference pattern of ERPs and a spatial filter which cancels non taskrelated brain activity.
Agnostic Active Learning Without Constraints
"... We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this vers ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
Direct Density Ratio Estimation for Largescale Covariate Shift Adaptation
"... Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importa ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importance, which is the ratio of test and training densities. We propose a novel method that allows us to directly estimate the importance from samples without going through the hard task of density estimation. An advantage of the proposed method is that the computation time is nearly independent of the number of test input samples, which is highly beneficial in recent applications with large numbers of unlabeled samples. We demonstrate through experiments that the proposed method is computationally more efficient than existing approaches with comparable accuracy.
Covariate Shift by Kernel Mean Matching
"... Given sets of observations of training and test data, we consider the problem of reweighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Given sets of observations of training and test data, we consider the problem of reweighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. We provide a uniform convergence bound on the distance between the reweighted training feature mean and the test feature mean, a transductive bound on the expected loss of an algorithm trained on the reweighted data, and a connection to single class SVMs. While our method is designed to deal with the case of simple covariate shift (in the sense of Chapter??), we have also found benefits for sample selection bias on the labels. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is “simpler” than the data might suggest.
Relative DensityRatio Estimation for Robust Distribution Comparison
"... Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and tw ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and twosample homogeneity test. However, since densityratio functions often possess high fluctuation, divergence estimation is still a challenging task in practice. In this paper, we propose to use relative divergences for distribution comparison, which involves approximation of relative densityratios. Since relative densityratios are always smoother than corresponding ordinary densityratios, our proposed method is favorable in terms of the nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach. 1
SemiSupervised Local Fisher Discriminant Analysis for Dimensionality Reduction. PAKDD
, 2008
"... Abstract. When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly due to overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semisupervised dimensionality reduct ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract. When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly due to overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semisupervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method has an analytic form of the globally optimal solution and it can be computed based on eigendecompositions. Therefore, the proposed method is computationally reliable and efficient. We show the effectiveness of the proposed method through extensive simulations with benchmark data sets. 1
Adaptive importance sampling with automatic model selection in value function approximation
 In Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence (AAAI2008
, 2008
"... Offpolicy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively expensive. A common approach is to use importance sampling techniques for compensating for the bias c ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Offpolicy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively expensive. A common approach is to use importance sampling techniques for compensating for the bias caused by the difference between datasampling policies and the target policy. However, existing offpolicy methods do not often take the variance of value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the tradeoff between bias and variance. We further provide a method for optimally determining the tradeoff parameter based on a variant of crossvalidation. We demonstrate the usefulness of the proposed approach through simulations.
Knowledge Transfer on Hybrid Graph
"... In machine learning problems, labeled data are often in short supply. One of the feasible solution for this problem is transfer learning. It can make use of the labeled data from other domain to discriminate those unlabeled data in the target domain. In this paper, we propose a transfer learning fra ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In machine learning problems, labeled data are often in short supply. One of the feasible solution for this problem is transfer learning. It can make use of the labeled data from other domain to discriminate those unlabeled data in the target domain. In this paper, we propose a transfer learning framework based on similarity matrix approximation to tackle such problems. Two practical algorithms are proposed, which are the label propagation and the similarity propagation. In these methods, we build a hybrid graph based on all available data. Then the information is transferred cross domains through alternatively constructing the similarity matrix for different part of the graph. Among all related methods, similarity propagation approach can make maximum use of all available similarity information across domains. This leads to more efficient transfer and better learning result. The experiment on real world text mining applications demonstrates the promise and effectiveness of our algorithms. 1