Results 1 - 10
of
12
Additive Regularization: Fusion of Training and Validation Levels in Kernel Methods
- INTERNAL REPORT 03-184, ESAT-SCD-SISTA, K.U.LEUVEN
, 2003
"... In this paper the training of Least Squares Support Vector Machines (LS-SVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additiv ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
In this paper the training of Least Squares Support Vector Machines (LS-SVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additive regularization mechanism is that it enables to achieve computational fusion of the training and validation levels leading to the solution of one single set of linear equations that characterizes the training and validation at once. The problem of avoiding overfitting on validation data is approached by restricting explicitly the degrees of freedom of the regularization constants. Di#erent restriction schemes are investigated, including an ensemble model approach. The link between the Tikhonov scheme and additive regularization is explained and an efficient cross-validation method with additive regularization is proposed. The new methods are illustrated with several examples on synthetic and real-life data sets.
Generalised Kernel Machines
"... Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a non-linear ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract — The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a likelihood drawn from the exponential family of distributions. In this paper, we apply the kernel trick to give a non-linear variant of the GLM, the generalised kernel machine (GKM), in which a regularised GLM is constructed in a fixed feature space implicitly defined by a Mercer kernel. The MATLAB symbolic maths toolbox is used to automatically create a suite of generalised kernel machines, including methods for automated model selection based on approximate leave-one-out cross-validation. In doing so, we provide a common framework encompassing a wide range of existing and novel kernel learning methods, and highlight their connections with earlier techniques from classical statistics. Examples including kernel ridge regression,
Building Sparse Representations and Structure Determination on LS-SVM Substrates
- Neurocomputing
, 2004
"... This paper studies a method to obtain sparseness and structure detection for a class of kernel machines related to Least Squares Support Vector Machines (LS-SVMs). The key method to derive such kernel machines is to adopt an hierarchical modeling strategy. Here, the first level consists of an LS-SVM ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper studies a method to obtain sparseness and structure detection for a class of kernel machines related to Least Squares Support Vector Machines (LS-SVMs). The key method to derive such kernel machines is to adopt an hierarchical modeling strategy. Here, the first level consists of an LS-SVM substrate which is based upon an LS-SVM formulation with additive regularization trade-off. This regularization trade-off is tuned at higher levels such that sparse representations and/or structure detection are obtained. The conceptual levels are kept strictly separated by working with exact optimality conditions, while the hyper-parameters guide the interaction between the levels. From a computational point of view, all levels can be fused into a single convex optimization problem. Furthermore, the principle is applied in order to optimize the validation performance of the resulting kernel machine. Sparse representations as well as structure detection are obtained by using an L regularization scheme and a measure of maximal variation respectively at a higher level. A number of case studies indicate the usefulness of these approaches both with respect to interpretability of the final model as well as for generalization performance.
Estimating Predictive Variances with Kernel Ridge Regression
- Machine Learning Challenges
, 2006
"... Abstract. In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach. 1
Maximum Relative Margin and Data-Dependent regularization
- JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.
Face recognition based on ordinal correlation approach
- International Conference on Intelligent Sensors, Networks and Information Processing (ISSNIP2005
, 2005
"... In this paper, we propose a new face recognition system based on the ordinal correlation principle. First, we will explain the ordinal similarity measure for any two images and then propose a systematic approach for face recognition based on this ordinal measure. In addition, we will design an algor ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we propose a new face recognition system based on the ordinal correlation principle. First, we will explain the ordinal similarity measure for any two images and then propose a systematic approach for face recognition based on this ordinal measure. In addition, we will design an algorithm for selecting a suitable classification threshold via using the information obtained from the training database. Finally, experimentation is conducted on the Yale datasets and the results show that the proposed face recognition approach outperforms the Eigenface and 2DPCA approaches significantly and also the threshold selection algorithm works effectively. 1.
Preventing over-fitting during model selection using Bayesian regularisation
- JMLR
, 2007
"... While the model parameters of a kernel machine are typically given by the solution of a convex optimisation problem, with a single global optimum, the selection of good values for the regularisation and kernel parameters is much less straightforward. Fortunately the leave-one-out cross-validation pr ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
While the model parameters of a kernel machine are typically given by the solution of a convex optimisation problem, with a single global optimum, the selection of good values for the regularisation and kernel parameters is much less straightforward. Fortunately the leave-one-out cross-validation procedure can be performed or a least approximated very efficiently in closed form for a wide variety of kernel learning methods, providing a convenient means for model selection. Leave-one-out cross-validation based estimates of performance, however, generally exhibit a relatively high variance and are therefore prone to over-fitting. In this paper, we investigate the novel use of Bayesian regularisation at the second level of inference, adding a regularisation term to the model selection criterion corresponding to a prior over the hyper-parameter values, where the additional regularisation parameters are integrated out analytically. Results obtained on a suite of thirteen real-world
A Cascading Support Vector Machines System for Gene Expression Data Classification
"... Abstract⎯Microarray technology provides the ability of monitoring the gene expression levels of thousands of genes in parallel. Gene expression data classification applies for diseases ’ diagnosis or prediction. We propose a novel intelligent system for the classification of multiclass gene expressi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract⎯Microarray technology provides the ability of monitoring the gene expression levels of thousands of genes in parallel. Gene expression data classification applies for diseases ’ diagnosis or prediction. We propose a novel intelligent system for the classification of multiclass gene expression data. It is based on a cascading Support Vector Machines (SVM) scheme and utilizes Welch’s t-test for the detection of differentially expressed genes. The system was applied for the discrimination of normal and lung cancer subtypes ’ specimens. The overall accuracy achieved was 98.5%. The results show that the proposed system can be efficiently used for microarray data analysis.
On the Dangers of Cross-Validation. An Experimental Evaluation
"... Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) ..."
Abstract
- Add to MetaCart
Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) cost of classification algorithms, and the development of closed-form solutions (for performing cross validation in certain classes of learning algorithms) makes it possible to test thousand or millions of variants of learning models on the data. Thus, it is now possible to calculate cross validation performance on a much larger number of tuned models than would have been possible otherwise. However, we empirically show how under such large number of models the risk for overfitting increases and the performance estimated by cross validation is no longer an effective estimate of generalization; hence, this paper provides an empirical reminder of the dangers of cross validation. We use a closed-form solution that makes this evaluation possible for the cross validation problem of interest. In addition, through extensive experiments we expose and discuss the effects of the overuse/misuse of cross validation in various aspects, including model selection, feature selection, and data dimensionality. This is illustrated on synthetic, benchmark, and real-world data sets. 1
IKM CKS Siemens Medical Solutions USA
"... Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) ..."
Abstract
- Add to MetaCart
Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) cost of classification algorithms, and the development of closed-form solutions (for performing cross validation in certain classes of learning algorithms) makes it possible to test thousand or millions of variants of learning models on the data. Thus, it is now possible to calculate cross validation performance on a much larger number of tuned models than would have been possible otherwise. However, we empirically show how under such large number of models the risk for overfitting increases and the performance estimated by cross validation is no longer an effective estimate of generalization; hence, this paper provides an empirical reminder of the dangers of cross validation. We use a closed-form solution that makes this evaluation possible for the cross validation problem of interest. In addition, through extensive experiments we expose and discuss the effects of the overuse/misuse of cross validation in various aspects, including model selection, feature selection, and data dimensionality. This is illustrated on synthetic, benchmark, and real-world data sets. 1

