Results 1  10
of
17
Gaussian Processes  Iterative Sparse Approximations
, 2002
"... This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. ASTON UNIVERSITY ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. ASTON UNIVERSITY
Healing the relevance vector machine through augmentation
 In Proc. of ICML 22
, 2005
"... The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that they get smaller the further you move away from the training cases. We give a thoroug ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that they get smaller the further you move away from the training cases. We give a thorough analysis. Inspired by the analogy to nondegenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM * could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions. Bayesian inference based on Gaussian Processes (GPs) has become widespread in the machine learning community. However, their naïve applicability is marred by computational constraints. A number of recent publications have addressed this issue by means of sparse approximations, although ideologically sparseness is at variance with Bayesian principles1. In this paper we view sparsity purely as a way to achieve computational convenience and not as under other nonBayesian paradigms where sparseness itself is seen as a means to ensure good generalization.
Optimization of the SVM Kernels using an Empirical Error Minimization Scheme.
 In Proc. of the International Workshop on Pattern Recognition with Support Vector Machine
, 2002
"... We address the problem of optimizing kernel parameters in Support Vector Machine modelling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Cha ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of optimizing kernel parameters in Support Vector Machine modelling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chapelle et al. for optimizing SVM kernels using an analytic upper bound of the error. However, our optimization scheme minimizes an empirical error estimate using a QuasiNewton technique. The method has shown to reduce the number of support vectors along the optimization process. In order to assess our contribution, the approach is further used for adapting KMOD, RBF and polynomial kernels on synthetic data and NIST digit image database.
Support Vector Machines for Handwritten Numerical String Recognition
 Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR9
, 2004
"... In this paper we discuss the use of SVMs to recognize handwritten numerical strings. Such a problem is more complex than recognizing isolated digits since one must deal with problems such as segmentation, overlapping, unknown number of digits, etc. In order to perform our experiments, we have used a ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
In this paper we discuss the use of SVMs to recognize handwritten numerical strings. Such a problem is more complex than recognizing isolated digits since one must deal with problems such as segmentation, overlapping, unknown number of digits, etc. In order to perform our experiments, we have used a segmentationbased recognition system using heuristic oversegmentation. The contribution of this paper is twofold. Firstly, we demonstrate by experimentation that SVMs improve the overall recognition rates. Secondly, we observe that SVMs deal with outliers such as over and undersegmentation better than multilayer perceptron neural networks.
Generalization And Regularization in Nonlinear Learning Systems
 The Handbook of Brain Theory and Neural Networks
, 1994
"... this article we will describe generalization and regularization from the point of view of multivariate function estimation in a statistical context. Multivariate function estimation is not, in principle, distinguishable from supervised machine learning. However, until fairly recently supervised mach ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
this article we will describe generalization and regularization from the point of view of multivariate function estimation in a statistical context. Multivariate function estimation is not, in principle, distinguishable from supervised machine learning. However, until fairly recently supervised machine learning and multivariate function estimation had fairly distinct groups of practitioners, and small overlap in language, literature, and in the kinds of practical problems under study. In any case, we are given a training set, consisting of pairs of input (feature) vectors and associated outputs ft(i); y i g, for n training or example subjects, i = 1; :::n. From this data, it is desired to construct a map which generalizes well, that is, given a new value of t, the map will provide a reasonable prediction for the unobserved output associated with this t.
Automatic model selection for the optimization of the SVM kernels
 Pattern Recognit. 2005
"... This approach aims to optimize the kernel parameters and to efficiently reduce the number of support vectors, so that the generalization error can be reduced drastically. The proposed methodology suggests the use of a new model selection criterion based on the estimation of the probability of error ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
This approach aims to optimize the kernel parameters and to efficiently reduce the number of support vectors, so that the generalization error can be reduced drastically. The proposed methodology suggests the use of a new model selection criterion based on the estimation of the probability of error of the SVM classifier. For comparison, we considered two more model selection criteria: GACV (‘Generalized Approximate CrossValidation’) and VC (‘VapnikChernovenkis’) dimension. These criteria are algebraic estimates of upper bounds of the expected error. For the former, we also propose a new minimization scheme. The experiments conducted on a biclass problem show that we can adequately choose the SVM hyperparameters using the empirical error criterion. Moreover, it turns out that the criterion produces a less complex model with fewer support vectors. For multiclass data, the optimization strategy is adapted to the oneagainstone data partitioning. The approach is then evaluated on images of handwritten digits from the USPS database.
Empirical Error based Optimization of SVM Kernels: Application to Digit Image Recognition.
 In the 8 th IWFHR, Niagaraonthelake
, 2002
"... We address the problem of optimizing kernel parameters in Support Vector Machine modeling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chap ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We address the problem of optimizing kernel parameters in Support Vector Machine modeling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chapelle et al. for optimizing SVM kernels using an analytic upper bound of the error. However, our optimization scheme minimizes an empirical error estimate using a QuasiNewton optimization method. To assess our method, the approach is further used for adapting KMOD, RBF and polynomial kernels on synthetic data and NIST database. The method shows a much faster convergence with satisfactory results in comparison with the simple gradient descent method.
Analysis of some methods for reduced rank Gaussian process regression
 PROC. HAMILTON SUMMER SCHOOL ON SWITCHING AND LEARNING IN
, 2004
"... While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proli ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of costeffective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also
A Machine Learning Approach to Developing Rigidbody Dynamics Simulators for
"... Abstract—I present a machine learning based rigidbody dynamics simulator for trot gaits of the quadruped LittleDog1. My contribution can be divided into two parts: the rst part for the reduction of onetimestep prediction error, and the second part for a more accurate prediction over longer time s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—I present a machine learning based rigidbody dynamics simulator for trot gaits of the quadruped LittleDog1. My contribution can be divided into two parts: the rst part for the reduction of onetimestep prediction error, and the second part for a more accurate prediction over longer time scales. First, in order to reduce the onestep prediction error, I compared three regression methods: 1st order linear regression (LR), linear weight projection regression (LWPR) [1] and Gaussian process regression (GPR) [2]. Although GPR shows the highest accuracy, its cost for computation and storage – O(n 3) and O(n 2), respectively – is too high to handle a large amount of training data which are required in my problem. Therefore, I developed a sparse GPR, called “local ” GPR (LGPR). In LGPR, training data are divided into k clusters by kmeans, and a locally full GPR model is constructed for each cluster. The prediction is done with the local model closest to a test data point, in terms of the normalized Euclidean distance. The cost for computation is tractable, approximately O(m 3) where m=n/k, making it possible to use a large amount of training data. I showed that LGPR signi cantly reduces the onestep prediction error. Second, to reduce the accumulation of prediction error, I proposed a projection method, called αP ROJ. I found that using LGPR as it is accumulates the prediction error too much over longer time scales. One of the main sources of such error turned out to be in some predicted states that are not within the dynamically feasible region. αP ROJ projects a predicted state to such region. I de ned the region without using any speci c information of LittleDog (e.g. max/min joint angles). It is only de ned by the training data. Thus, this projection method can be applied to other robot systems without modi cation. αP ROJ highly improved the prediction over longer time scales. The empirical results show that these key ideas led to the signi cant reduction of prediction error and better performance than the ODE2 simulator.