Results 1  10
of
59
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract

Cited by 122 (55 self)
 Add to MetaCart
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the braincomputer interface, where strong nonstationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and KlausRobert Müller.
InputDependent Estimation of Generalization Error under Covariate Shift
 STATISTICS & DECISIONS, VOL.23, NO.4, PP.249–279, 2005
, 2005
"... A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption— ..."
Abstract

Cited by 61 (32 self)
 Add to MetaCart
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift— causes a heavy bias in standard generalization error estimation schemes such as crossvalidation or Akaike’s information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existing model selection methods in regression for extrapolation and in classification with imbalanced data.
Active Learning in Approximately Linear Regression Based On Conditional . . .
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... The goal of active learning is to determine the locations of training input points so that the generalization error is minimized. We discuss the problem of active learning in linear regression scenarios. Traditional active ..."
Abstract

Cited by 59 (26 self)
 Add to MetaCart
The goal of active learning is to determine the locations of training input points so that the generalization error is minimized. We discuss the problem of active learning in linear regression scenarios. Traditional active
Active Learning with Local Models
 Neural Processing Letters
, 1998
"... In this contribution, we deal with active learning, which gives the learner the power to select training samples. We propose a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now. Our query algorithm is based o ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
In this contribution, we deal with active learning, which gives the learner the power to select training samples. We propose a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now. Our query algorithm is based on the idea of selecting a query on the borderline of the actual classification. This is done by drawing on the geometrical properties of local models that typically induce a Voronoi tessellation on the input space, so that the Voronoi vertices of this tessellation offer themselves as prospective query points. The performance of the new query algorithm is tested on the twospirals problem with promising results. Keywords: active learning, local models, query based learning, vector quantization 1 Introduction In supervised learning, we are interested in training a student on a set of input output pairs generated by an unknown target function in such a way that the student does not only re...
Incremental active learning for optimal generalization
 Neural Computation
, 1999
"... The problem of designing input signals for optimal generalization is called active learning. In this paper, we give a twostage sampling scheme for reducing both the bias and variance, and based on this scheme, we propose two active learning methods. One is the multipointsearch method applicable t ..."
Abstract

Cited by 17 (13 self)
 Add to MetaCart
The problem of designing input signals for optimal generalization is called active learning. In this paper, we give a twostage sampling scheme for reducing both the bias and variance, and based on this scheme, we propose two active learning methods. One is the multipointsearch method applicable to arbitrary models. The effectiveness of this method is shown through computer simulations. The other is the optimal sampling method in trigonometric polynomial models. This method precisely specifies the optimal sampling locations.
Kernel Affine Projection Algorithms
, 2007
"... The combination of the famed kernel trick and affine projection algorithms (APA) yields powerful nonlinear extensions, named collectively here KAPA. This paper is a followup study of the recently introduced kernel leastmeansquare algorithm (KLMS). KAPA inherits the simplicity and online nature of ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
The combination of the famed kernel trick and affine projection algorithms (APA) yields powerful nonlinear extensions, named collectively here KAPA. This paper is a followup study of the recently introduced kernel leastmeansquare algorithm (KLMS). KAPA inherits the simplicity and online nature of KLMS while reducing its gradient noise, boosting performance. More interestingly, it provides a unifying model for several neural network techniques, including kernel leastmeansquare algorithms, kernel adaline, slidingwindow kernel recursiveleastsquares (KRLS) and regularization networks. Therefore, many insights can be gained into the basic relations among them and the tradeoff between computation complexity and performance. Several simulations illustrate its wide applicability. Affine projection algorithms, kernel methods.
The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces
 Journal of Machine Learning Research
, 2002
"... A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of mode ..."
Abstract

Cited by 15 (14 self)
 Add to MetaCart
A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression.
Deterministic Design for Neural Network Learning: An Approach Based on Discrepancy
"... The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process. In particular, the consistency of the Empirical Risk Minimization ( ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process. In particular, the consistency of the Empirical Risk Minimization (ERM) principle is analyzed, when the points in the input space are generated by employing a purely deterministic algorithm (deterministic learning). When the
Incremental Learning using Sensitivity Analysis
, 1999
"... A new incremental learning algorithm for function approximation problems is presented where the neural network learner dynamically selects during training the most informative patterns from a candidate training set. The incremental learning algorithm uses its current knowledge about the function to ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
A new incremental learning algorithm for function approximation problems is presented where the neural network learner dynamically selects during training the most informative patterns from a candidate training set. The incremental learning algorithm uses its current knowledge about the function to be approximated, in the form of output sensitivity information, to incrementally grow the training set with patterns that have the highest influence on the learning objective.