Results 1  10
of
21
KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—This paper examines the theory of kernel Fisher discriminant analysis (KFD) in a Hilbert space and develops a twophase KFD framework, i.e., kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA). This framework provides novel insights into the nature of K ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
Abstract—This paper examines the theory of kernel Fisher discriminant analysis (KFD) in a Hilbert space and develops a twophase KFD framework, i.e., kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA). This framework provides novel insights into the nature of KFD. Based on this framework, the authors propose a complete kernel Fisher discriminant analysis (CKFD) algorithm. CKFD can be used to carry out discriminant analysis in “double discriminant subspaces. ” The fact that, it can make full use of two kinds of discriminant information, regular and irregular, makes CKFD a more powerful discriminator. The proposed algorithm was tested and evaluated using the FERET face database and the CENPARMI handwritten numeral database. The experimental results show that CKFD outperforms other KFD algorithms. Index Terms—Kernelbased methods, subspace methods, principal component analysis (PCA), Fisher linear discriminant analysis (LDA or FLD), feature extraction, machine learning, face recognition, handwritten digit recognition. æ 1
Additive Regularization: Fusion of Training and Validation Levels in Kernel Methods
 INTERNAL REPORT 03184, ESATSCDSISTA, K.U.LEUVEN
, 2003
"... In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additiv ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additive regularization mechanism is that it enables to achieve computational fusion of the training and validation levels leading to the solution of one single set of linear equations that characterizes the training and validation at once. The problem of avoiding overfitting on validation data is approached by restricting explicitly the degrees of freedom of the regularization constants. Di#erent restriction schemes are investigated, including an ensemble model approach. The link between the Tikhonov scheme and additive regularization is explained and an efficient crossvalidation method with additive regularization is proposed. The new methods are illustrated with several examples on synthetic and reallife data sets.
Optimally regularised kernel fisher discriminant analysis
 in Proc. 17th Int. Conf. Pattern Recognit
, 2004
"... Mika et al. [3] introduce a nonlinear formulation of Fisher’s linear discriminant, based the now familiar “kernel trick”, demonstrating stateoftheart performance on a wide range of realworld benchmark datasets. In this paper, we show that the usual regularisation parameter can be adjusted so as ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Mika et al. [3] introduce a nonlinear formulation of Fisher’s linear discriminant, based the now familiar “kernel trick”, demonstrating stateoftheart performance on a wide range of realworld benchmark datasets. In this paper, we show that the usual regularisation parameter can be adjusted so as to minimise the leaveoneout crossvalidation error with a computational complexity of only O(ℓ 2) operations, where ℓ is the number of training patterns, rather than the O(ℓ 4) operations required for a na¨´eve implementation of the leaveoneout procedure. This procedure is then used to form a component of an ef£cient heirarchical model selection strategy where the regularisation parameter is optimised within the inner loop while the kernel parameters are optimised in the outer loop. where SB = (m1 −m2)(m1 −m2) T, is the between class scatter matrix, mj is the mean of patterns belonging to Cj, mj = 1 ℓj ℓj∑ i=1 x j i, and SW is the within class scatter matrix SW = i∈{1,2} j=1 ℓi (x i j − mi)(x i j − mi) T. The innovation introduced by Mika et al. [3] is to construct Fisher’s linear discriminant in a £xed feature space F (φ: X → F) induced by a positive de£nite Mercer kernel K: X × X → R de£ning the inner product K(x, x ′ ) = φ(x) · φ(x ′ ) (see e.g. Cristianini and ShaweTaylor [2]). Let the kernel matrices for the entire dataset, K, and for each class, K1 and K2 be de£ned as follows: K = [kij = K(xi, xj)] ℓ i,j=1
Generalised Kernel Machines
"... Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a nonlinear ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a nonlinear variant of the GLM, the generalised kernel machine (GKM), in which a regularised GLM is constructed in a fixed feature space implicitly defined by a Mercer kernel. The MATLAB symbolic maths toolbox is used to automatically create a suite of generalised kernel machines, including methods for automated model selection based on approximate leaveoneout crossvalidation. In doing so, we provide a common framework encompassing a wide range of existing and novel kernel learning methods, and highlight their connections with earlier techniques from classical statistics. Examples including kernel ridge regression,
Maximum Relative Margin and DataDependent regularization
 JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a datadependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on realworld problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.
Building Sparse Representations and Structure Determination on LSSVM Substrates
 NEUROCOMPUTING
, 2004
"... This paper studies a method to obtain sparseness and structure detection for a class of kernel machines related to Least Squares Support Vector Machines (LSSVMs). The key method to derive such kernel machines is to adopt an hierarchical modeling strategy. Here, the first level consists of an LSSVM ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper studies a method to obtain sparseness and structure detection for a class of kernel machines related to Least Squares Support Vector Machines (LSSVMs). The key method to derive such kernel machines is to adopt an hierarchical modeling strategy. Here, the first level consists of an LSSVM substrate which is based upon an LSSVM formulation with additive regularization tradeoff. This regularization tradeoff is tuned at higher levels such that sparse representations and/or structure detection are obtained. The conceptual levels are kept strictly separated by working with exact optimality conditions, while the hyperparameters guide the interaction between the levels. From a computational point of view, all levels can be fused into a single convex optimization problem. Furthermore, the principle is applied in order to optimize the validation performance of the resulting kernel machine. Sparse representations as well as structure detection are obtained by using an L regularization scheme and a measure of maximal variation respectively at a higher level. A number of case studies indicate the usefulness of these approaches both with respect to interpretability of the final model as well as for generalization performance.
Estimating Predictive Variances with Kernel Ridge Regression
 Machine Learning Challenges
, 2006
"... Abstract. In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leaveoneout crossvalidation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach. 1
Time Series Prediction using DirRec Strategy
"... Abstract. This paper demonstrates how the selection of Prediction Strategy is important in the LongTerm Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the tw ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. This paper demonstrates how the selection of Prediction Strategy is important in the LongTerm Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the two already used ones. A simple kNN approximation method is used and all three strategies are applied to two benchmarks: Santa Fe and Poland Electricity Load time series. 1
A kernelinduced space selection approach to model selection in klda. Networks
 IEEE Trans. Neural
"... Abstract—Model selection in kernel linear discriminant analysis (KLDA) refers to the selection of appropriate parameters of a kernel function and the regularizer. By following the principle of maximum information preservation, this paper formulates the model selection problem as a problem of selecti ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract—Model selection in kernel linear discriminant analysis (KLDA) refers to the selection of appropriate parameters of a kernel function and the regularizer. By following the principle of maximum information preservation, this paper formulates the model selection problem as a problem of selecting an optimal kernelinduced space in which different classes are maximally separated from each other. A scattermatrixbased criterion is developed to measure the “goodness ” of a kernelinduced space, and the kernel parameters are tuned by maximizing this criterion. This criterion is computationally efficient and is differentiable with respect to the kernel parameters. Compared with the leaveoneout (LOO) orfold cross validation (CV), the proposed approach can achieve a faster model selection, especially when the number of training samples is large or when many kernel parameters need to be tuned. To tune the regularization parameter in the KLDA, our criterion is used together with the method proposed by Saadi et al. (2004). Experiments on benchmark data sets verify the effectiveness of this model selection approach. Index Terms—Kernelinduced space selection, kernel linear discriminant analysis (KLDA), kernel parameter tuning, model selection. I.
Preventing overfitting during model selection using Bayesian regularisation
 JMLR
, 2007
"... While the model parameters of a kernel machine are typically given by the solution of a convex optimisation problem, with a single global optimum, the selection of good values for the regularisation and kernel parameters is much less straightforward. Fortunately the leaveoneout crossvalidation pr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
While the model parameters of a kernel machine are typically given by the solution of a convex optimisation problem, with a single global optimum, the selection of good values for the regularisation and kernel parameters is much less straightforward. Fortunately the leaveoneout crossvalidation procedure can be performed or a least approximated very efficiently in closed form for a wide variety of kernel learning methods, providing a convenient means for model selection. Leaveoneout crossvalidation based estimates of performance, however, generally exhibit a relatively high variance and are therefore prone to overfitting. In this paper, we investigate the novel use of Bayesian regularisation at the second level of inference, adding a regularisation term to the model selection criterion corresponding to a prior over the hyperparameter values, where the additional regularisation parameters are integrated out analytically. Results obtained on a suite of thirteen realworld