Results 1  10
of
31
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract

Cited by 71 (39 self)
 Add to MetaCart
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the braincomputer interface, where strong nonstationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and KlausRobert Müller.
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
, 2003
"... Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this pape ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the “knee ” in a ‘ # of clusters vs. clustering evaluation metric ’ graph. Using the knee is wellknown, but is not a particularly wellunderstood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.
Estimating the Number of Segments in Time Series Data Using Permutation Tests
 IEEE International Conference on Data Mining
, 2002
"... Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing. ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing.
The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces
 Journal of Machine Learning Research
, 2002
"... A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of mode ..."
Abstract

Cited by 12 (12 self)
 Add to MetaCart
A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression.
Optimal design of regularization term and regularization parameter by subspace information criterion
 Neural Networks
, 2000
"... The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error wit ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error with finite samples under certain conditions. In this paper, we apply SIC to regularization learning and use it for (a) choosing the optimal regularization term and regularization parameter from given candidates, and (b) obtaining the closed form of the optimal regularization parameter for a fixed regularization term. The effectiveness of SIC is demonstrated through computer simulations with artificial and real data. Keywords supervised learning, generalization error, linear regression, regularization learning, ridge regression, model selection, regularization parameter, subspace information criterion Optimal Regularization by SIC 2 Nomenclature f(x) : learning target function D: domain of f(x) xm: mth sample point ym: mth sample value ɛm: mth noise (xm,ym) : mth training example M: the number of training examples y: Mdimensional vector consisting of {ym} M m=1 ɛ: Mdimensional vector consisting of {ɛm} M m=1 ϕp(x) : pth basis function θp: pth coefficient µ: the number of basis functions JG: generalization error JTE: training error JR: regularized training error T: regularization matrix α: regularization parameter A: design matrix XT,α: regularization learning matrix U: µdimensional matrix θ: true parameter ˆθT,α: regularization estimate ˆθu: unbiased estimate σ 2: noise variance 1
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Trading Variance Reduction with Unbiasedness  The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression
 NEURAL COMPUTATION
, 2004
"... A wellknown result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularizati ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
A wellknown result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However,
Learning states and rules for detecting anomalies in time series
 Applied Intelligence
"... The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies.
Subspace Information Criterion for Image Restoration  Optimizing Parameters in Linear Filters
 IEICE Transactions on Information and Systems
, 2001
"... Most of the image restoration filters proposed so far include parameters that control the restoration properties. For bringing out the optimal restoration performance, these parameters should be determined so as to minimize a certain error measure such as the mean squared error (MSE) between the res ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Most of the image restoration filters proposed so far include parameters that control the restoration properties. For bringing out the optimal restoration performance, these parameters should be determined so as to minimize a certain error measure such as the mean squared error (MSE) between the restored image and original image. However, this is not generally possible since the unknown original image itself is required for evaluating MSE. In this paper, we derive an estimator of MSE called the subspace information criterion (SIC), and propose determining the parameter values so that SIC is minimized. For any linear filter, SIC gives an unbiased estimate of the expected MSE over the noise. Therefore, the proposed method is valid for any linear filter. Computer simulations with the movingaverage filter demonstrate that SIC gives a very accurate estimate of MSE in various situations, and the proposed procedure actually gives the optimal parameter values that minimize MSE. Keywords image restoration, mean squared error, subspace information criterion, movingaverage filter, model selection. Subspace Information Criterion for Image Restoration 2 1