Results 1 
6 of
6
The concentration of fractional distances
 IEEE Trans. on Knowledge and Data Engineering
, 2007
"... Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, t ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowskilike norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given. Index Terms—Nearest neighbor search, highdimensional data, distance concentration, fractional distances. 1
Regularized Discriminant Analysis, Ridge Regression and Beyond
"... Fisher linear discriminant analysis (FDA) and its kernel extension—kernel discriminant analysis (KDA)—are well known methods that consider dimensionality reduction and classification jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efficient i ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Fisher linear discriminant analysis (FDA) and its kernel extension—kernel discriminant analysis (KDA)—are well known methods that consider dimensionality reduction and classification jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efficient implementation and their relationship with least mean squares procedures. In this paper we address these issues within the framework of regularized estimation. Our approach leads to a flexible and efficient implementation of FDA as well as KDA. We also uncover a general relationship between regularized discriminant analysis and ridge regression. This relationship yields variations on conventional FDA based on the pseudoinverse and a direct equivalence to an ordinary least squares estimator.
Implementation of Algorithms for Tuning Parameters in Regularized Least Squares Problems in System Identification
, 2013
"... ..."
Noname manuscript No. (will be inserted by the editor) Efficient CrossValidation for Kernelized LeastSquares Regression with Sparse Basis Expansions
"... Abstract We propose an efficient algorithm for calculating holdout and crossvalidation (CV) type of estimates for sparse regularized leastsquares predictors. Holding out H data points with our method requires O(min(H 2 n,Hn 2)) time providedthatapredictorwithnbasisvectorsisalreadytrained.In additi ..."
Abstract
 Add to MetaCart
Abstract We propose an efficient algorithm for calculating holdout and crossvalidation (CV) type of estimates for sparse regularized leastsquares predictors. Holding out H data points with our method requires O(min(H 2 n,Hn 2)) time providedthatapredictorwithnbasisvectorsisalreadytrained.In additiontoholding out training examples, also some of the basis vectors used to train the sparse regularized leastsquares predictor with the whole training set can be removed from the basis vector set used in the holdout computation. In our experiments, we demonstrate the speed improvements provided by our algorithm in practise, and we empirically show the benefits of removing some of the basis vectors during the CV rounds.
Morozov, Ivanov and Tikhonov regularization based LSSVMs
"... This paper contrasts three related regularization schemes for kernel machines using a least squares criterion, namely Tikhonov and Ivanov regularization and Morozov's discrepancy principle. We derive the conditions for optimality in a least squares support vector machine context (LSSVMs) wh ..."
Abstract
 Add to MetaCart
(Show Context)
This paper contrasts three related regularization schemes for kernel machines using a least squares criterion, namely Tikhonov and Ivanov regularization and Morozov's discrepancy principle. We derive the conditions for optimality in a least squares support vector machine context (LSSVMs) where they differ in the role of the regularization parameter. In particular, the Ivanov and Morozov scheme express the tradeoff between datafitting and smoothness in the trust region of the parameters and the noise level respectively which both can be transformed uniquely to an appropriate regularization constant for a standard LSSVM. This insight is employed to tune automatically the regularization constant in an LSSVM framework based on the estimated noise level, which can be obtained by e.g. by a nonparametric differogram technique.
Residual Variance Estimation in Machine Learning
"... The problem of residual variance estimation consists of estimating the best possible generalization error obtainable by any model based on a finite sample of data. Even though it is a natural generalization of linear correlation, residual variance estimation in its general form has attracted relativ ..."
Abstract
 Add to MetaCart
(Show Context)
The problem of residual variance estimation consists of estimating the best possible generalization error obtainable by any model based on a finite sample of data. Even though it is a natural generalization of linear correlation, residual variance estimation in its general form has attracted relatively little attention in machine learning. In this paper, we examine four different residual variance estimators and analyze their properties both theoretically and experimentally to understand better their applicability in machine learning problems. The theoretical treatment differs from previous work by being based on a general formulation of the problem covering also heteroscedastic noise in contrary to previous work, which concentrates on homoscedastic and additive noise. In the second part of the paper, we demonstrate practical applications in input and model structure selection. The experimental results show that using residual variance estimators in these tasks gives good results often with a reduced computational complexity, while the nearest neighbor estimators are simple and easy to implement. Key words: noise variance estimation, residual variance, model structure selection, input selection, nonparametric estimator, nearest neighbour 1