Results 1  10
of
44
KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... This paper examines the theory of kernel Fisher discriminant analysis (KFD) in a Hilbert space and develops a twophase KFD framework, i.e., kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA). This framework provides novel insights into the nature of KFD. Base ..."
Abstract

Cited by 125 (6 self)
 Add to MetaCart
(Show Context)
This paper examines the theory of kernel Fisher discriminant analysis (KFD) in a Hilbert space and develops a twophase KFD framework, i.e., kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA). This framework provides novel insights into the nature of KFD. Based on this framework, the authors propose a complete kernel Fisher discriminant analysis (CKFD) algorithm. CKFD can be used to carry out discriminant analysis in “double discriminant subspaces.” The fact that, it can make full use of two kinds of discriminant information, regular and irregular, makes CKFD a more powerful discriminator. The proposed algorithm was tested and evaluated using the FERET face database and the CENPARMI handwritten numeral database. The experimental results show that CKFD outperforms other KFD algorithms.
2010, ‘On overfitting in model selection and subsequent selection bias in performance evaluation
 Journal of Machine Learning Research
"... Model selection strategies for machine learning algorithms typically involve the numerical optimisation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as kfold crossvalidation. The error of such an estimator can be broken down into bia ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Model selection strategies for machine learning algorithms typically involve the numerical optimisation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as kfold crossvalidation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a nonnegligible variance introduces the potential for overfitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in performance due to overfitting the model selection criterion can be surprisingly large, an observation that appears to have received little attention in the machine learning literature to date. In this paper, we show that the effects of this form of overfitting are often of comparable magnitude to differences in performance between learning algorithms, and thus cannot be ignored in empirical evaluation. Furthermore, we show that some common performance evaluation practices are susceptible to a form of selection bias as a result of this form of overfitting and hence are unreliable. We discuss methods to avoid overfitting in model selection and subsequent selection bias in performance evaluation, which we hope will be incorporated into best practice. While this study concentrates on crossvalidation based model selection, the findings are quite general and apply to any model selection practice involving the optimisation of a model selection criterion evaluated over a finite sample of data, including maximisation of the Bayesian evidence and optimisation of performance bounds.
Extended Kernel Recursive Least Squares Algorithm
"... This paper presents a kernelized version of the extended recursive least squares (EXKRLS) algorithm which implements for the first time a general linear state model in reproducing kernel Hilbert spaces (RKHS), or equivalently a general nonlinear state model in the input space. The center piece of t ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a kernelized version of the extended recursive least squares (EXKRLS) algorithm which implements for the first time a general linear state model in reproducing kernel Hilbert spaces (RKHS), or equivalently a general nonlinear state model in the input space. The center piece of this development is a reformulation of the well known extended recursive least squares (EXRLS) algorithm in RKHS which only requires inner product operations between input vectors, thus enabling the application of the kernel property (commonly known as the kernel trick). The first part of the paper presents a set of theorems that shows the generality of the approach. The EXKRLS is preferable to: (1) a standard kernel recursive least squares (KRLS) in applications that require tracking the statevector of general linear statespace models in the kernel space, or (2) an EXRLS when the application requires a nonlinear observation and state models. The second part of the paper compares the EXKRLS in nonlinear Rayleigh multipath channel tracking and in Lorenz system modeling problem. We show that the proposed algorithm is able to outperform the standard EXRLS and KRLS in both simulations.
Time Series Prediction using DirRec Strategy
"... Abstract. This paper demonstrates how the selection of Prediction Strategy is important in the LongTerm Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the tw ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper demonstrates how the selection of Prediction Strategy is important in the LongTerm Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the two already used ones. A simple kNN approximation method is used and all three strategies are applied to two benchmarks: Santa Fe and Poland Electricity Load time series. 1
Additive Regularization: Fusion of Training and Validation Levels in Kernel Methods
 INTERNAL REPORT 03184, ESATSCDSISTA, K.U.LEUVEN
, 2003
"... In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additiv ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additive regularization mechanism is that it enables to achieve computational fusion of the training and validation levels leading to the solution of one single set of linear equations that characterizes the training and validation at once. The problem of avoiding overfitting on validation data is approached by restricting explicitly the degrees of freedom of the regularization constants. Di#erent restriction schemes are investigated, including an ensemble model approach. The link between the Tikhonov scheme and additive regularization is explained and an efficient crossvalidation method with additive regularization is proposed. The new methods are illustrated with several examples on synthetic and reallife data sets.
Maximum Relative Margin and DataDependent regularization
 JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a datadependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on realworld problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.
Optimally regularised kernel fisher discriminant analysis
 in Proc. 17th Int. Conf. Pattern Recognit
, 2004
"... Mika et al. [3] introduce a nonlinear formulation of Fisher’s linear discriminant, based the now familiar “kernel trick”, demonstrating stateoftheart performance on a wide range of realworld benchmark datasets. In this paper, we show that the usual regularisation parameter can be adjusted so as ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
(Show Context)
Mika et al. [3] introduce a nonlinear formulation of Fisher’s linear discriminant, based the now familiar “kernel trick”, demonstrating stateoftheart performance on a wide range of realworld benchmark datasets. In this paper, we show that the usual regularisation parameter can be adjusted so as to minimise the leaveoneout crossvalidation error with a computational complexity of only O(ℓ 2) operations, where ℓ is the number of training patterns, rather than the O(ℓ 4) operations required for a na¨´eve implementation of the leaveoneout procedure. This procedure is then used to form a component of an ef£cient heirarchical model selection strategy where the regularisation parameter is optimised within the inner loop while the kernel parameters are optimised in the outer loop. where SB = (m1 −m2)(m1 −m2) T, is the between class scatter matrix, mj is the mean of patterns belonging to Cj, mj = 1 ℓj ℓj∑ i=1 x j i, and SW is the within class scatter matrix SW = i∈{1,2} j=1 ℓi (x i j − mi)(x i j − mi) T. The innovation introduced by Mika et al. [3] is to construct Fisher’s linear discriminant in a £xed feature space F (φ: X → F) induced by a positive de£nite Mercer kernel K: X × X → R de£ning the inner product K(x, x ′ ) = φ(x) · φ(x ′ ) (see e.g. Cristianini and ShaweTaylor [2]). Let the kernel matrices for the entire dataset, K, and for each class, K1 and K2 be de£ned as follows: K = [kij = K(xi, xj)] ℓ i,j=1
BioMed Central
, 2006
"... A novel approach to phylogenetic tree construction using stochastic optimization and clustering ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
A novel approach to phylogenetic tree construction using stochastic optimization and clustering
Generalised Kernel Machines
"... Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a nonlinear ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a nonlinear variant of the GLM, the generalised kernel machine (GKM), in which a regularised GLM is constructed in a fixed feature space implicitly defined by a Mercer kernel. The MATLAB symbolic maths toolbox is used to automatically create a suite of generalised kernel machines, including methods for automated model selection based on approximate leaveoneout crossvalidation. In doing so, we provide a common framework encompassing a wide range of existing and novel kernel learning methods, and highlight their connections with earlier techniques from classical statistics. Examples including kernel ridge regression,