Results 1 - 10
of
15
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
, 2003
"... Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this pape ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the “knee ” in a ‘ # of clusters vs. clustering evaluation metric ’ graph. Using the knee is well-known, but is not a particularly well-understood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the brain-computer interface, where strong non-stationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller.
Estimating the Number of Segments in Time Series Data Using Permutation Tests
- IEEE International Conference on Data Mining
, 2002
"... Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing. ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing.
Subspace Information Criterion for Non-Quadratic Regularizers - Model Selection for Sparse Regressors
- IEEE Transactions on Neural Networks
, 2002
"... Non-quadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Non-quadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and cross-validation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Optimal design of regularization term and regularization parameter by subspace information criterion
- Neural Networks
, 2000
"... The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error wit ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error with finite samples under certain conditions. In this paper, we apply SIC to regularization learning and use it for (a) choosing the optimal regularization term and regularization parameter from given candidates, and (b) obtaining the closed form of the optimal regularization parameter for a fixed regularization term. The effectiveness of SIC is demonstrated through computer simulations with artificial and real data. Keywords supervised learning, generalization error, linear regression, regularization learning, ridge regression, model selection, regularization parameter, subspace information criterion Optimal Regularization by SIC 2 Nomenclature f(x) : learning target function D: domain of f(x) xm: m-th sample point ym: m-th sample value ɛm: m-th noise (xm,ym) : m-th training example M: the number of training examples y: M-dimensional vector consisting of {ym} M m=1 ɛ: M-dimensional vector consisting of {ɛm} M m=1 ϕp(x) : p-th basis function θp: p-th coefficient µ: the number of basis functions JG: generalization error JTE: training error JR: regularized training error T: regularization matrix α: regularization parameter A: design matrix XT,α: regularization learning matrix U: µ-dimensional matrix θ: true parameter ˆθT,α: regularization estimate ˆθu: unbiased estimate σ 2: noise variance 1
The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces
- Journal of Machine Learning Research
, 2002
"... A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of mode ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression.
Learning states and rules for detecting anomalies in time series
- Applied Intelligence
"... The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies.
Trading Variance Reduction with Unbiasedness - The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression
- Neural Computation
, 2004
"... A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularizati ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction which has the beneficial e#ect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression and it was shown in earlier experiments that a small regularization of SIC has a stabilization e#ect.
A Unified Method for Optimizing Linear Image Restoration Filters
- Signal Processing
, 2002
"... Image restoration from degraded images lies at the foundation of image processing, pattern recognition, and computer vision, so it has been extensively studied. A large number of image restoration filters have been devised so far. It is known that a certain filter works excellently for a certain ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Image restoration from degraded images lies at the foundation of image processing, pattern recognition, and computer vision, so it has been extensively studied. A large number of image restoration filters have been devised so far. It is known that a certain filter works excellently for a certain type of original image or degradation.

