Results 11  20
of
82
The Kernel Recursive Least Squares Algorithm
 IEEE Transactions on Signal Processing
, 2003
"... We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Spars ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate online, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a datadependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof theart Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional leastsquares methods is commonplace: Time series prediction and channel equalization.
Building Support Vector Machines with Reduced Classifier Complexity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Weighted Least Squares Support Vector Machines: robustness and sparse approximation
 NEUROCOMPUTING
"... Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sp ..."
Abstract

Cited by 56 (15 self)
 Add to MetaCart
Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sparseness is lost in the LSSVM case and the estimation of the support values is only optimal in the case of a Gaussian distribution of the error variables. In this paper we discuss a method which can overcome these two drawbacks. We show how to obtain robust estimates for regression by applying a weighted version of LSSVM. We also discuss a sparse approximation procedure for weighted and unweighted LSSVM. It is basically a pruning method which is able to do pruning based upon the physical meaning of the sorted support values, while pruning procedures for classical multilayer perceptrons require the computation of a Hessian matrix or its inverse. The methods of this paper are illustrated for RBF kernels and demonstrate how to obtain robust estimates with selection of an appropriate number of hidden units, in the case of outliers or nonGaussian error distributions with heavy tails.
Computationally Efficient Face Detection
 In Proc. Intl. Conf. Computer Vision
, 2001
"... This paper describes an algorithm for finding faces within an image. The basis of the algorithm is to run an observation window at all possible positions, scales and orientation within the image. A nonlinear support vector machine is used to determine whether or not a face is contained within the o ..."
Abstract

Cited by 56 (10 self)
 Add to MetaCart
This paper describes an algorithm for finding faces within an image. The basis of the algorithm is to run an observation window at all possible positions, scales and orientation within the image. A nonlinear support vector machine is used to determine whether or not a face is contained within the observation window. The nonlinear support vector machine operates by comparing the input patch to a set of support vectors (which can be thought of as face and antiface templates). Each support vector is scored by some nonlinear function against the observation window and if the resulting sum is over some threshold a face is indicated. Because of the huge search space that is considered, it is imperative to investigate ways to speed up the support vector machine. Within this paper we suggest a method of speeding up the nonlinear support vector machine. A set of reduced set vectors (RV’s) are calculated from the support vectors. By considering the RV’s sequentially, and if at any point a face is deemed too unlikely to cease the sequential evaluation, obviating the need to evaluate the remaining RV’s. The idea being that we only need to apply a subset of the RV’s to eliminate things that are obviously not a face (thus reducing the computation). The key then is to explore the RV’s in the right order and a method for this is proposed. 1.
The Effect of the Input Density Distribution on Kernelbased Classifiers
 Proceedings of the 17th International Conference on Machine Learning
, 2000
"... The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This ha ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This has a number of consequences including (i) the eigenvalues/vectors of the n × n Gram matrix K obtained by evaluating the kernel at all pairs of training points K(x i , x j ) converge to the eigenvalues and eigenfunctions of the integral equation above as n ! 1 and (ii) the dependence of the eigenfunctions on p(x) may be useful for the classdiscrimination task. We show that on a number of datasets using the RBF kernel the eigenvalue spectrum of the Gram matrix decays rapidly, and discuss how this property might be used to speed up kernelbased predictors.
Sparse representation for Gaussian process models
 Advances in Neural Information Processing Systems
, 2001
"... We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data w ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large realworld datasets indicate the efficiency of the approach.
Optimal cluster preserving embedding of nonmetric proximity data
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concern ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction. Index Terms—Clustering, pairwise proximity data, cost function, embedding, MDS. 1
Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces
, 2003
"... We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher's discriminant and oriented PCA using su ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher's discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.
An Online Kernel Change Detection Algorithm
, 2004
"... A number of abrupt change detection methods have been proposed in the past, among which are efficient modelbased techniques such as the Generalized Likelihood Ratio (GLR) test. We consider the case where no accurate nor tractable model can be found, using a modelfree approach, called Kernel chang ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
A number of abrupt change detection methods have been proposed in the past, among which are efficient modelbased techniques such as the Generalized Likelihood Ratio (GLR) test. We consider the case where no accurate nor tractable model can be found, using a modelfree approach, called Kernel change detection (KCD). KCD compares two sets of descriptors extracted online from the signal at each time instant: the immediate past set and the immediate future set. Based on the soft margin singleclass Support Vector Machine (SVM), we build a dissimilarity measure in feature space between those sets, without estimating densities as an intermediary step. This dissimilarity measure is shown to be asymptotically equivalent to the Fisher ratio in the Gaussian case. Implementation issues are addressed, in particular, the dissimilarity measure can be computed online in input space. Simulation results on both synthetic signals and real music signals show the efficiency of KCD.
Modefinding for mixtures of Gaussian distributions
 Dept. of Computer Science, University of Sheffield
, 1999
"... I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of component ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of components, and are contained in the convex hull of the component centroids. Then, I develop two exhaustive mode search algorithms: one based on combined quadratic maximisation and gradient ascent and the other one based on a fixedpoint iterative scheme. Appropriate values for the search control parameters are derived by taking into account theoretical results regarding the bounds for the gradient and Hessian of the mixture. The significance of the modes is quantified locally (for each mode) by error bars, or confidence intervals (estimated using the values of the Hessian at each mode); and globally by the sparseness of the mixture, measured by its differential entropy (estimated through bounds). I conclude with some reflections about bumpfinding.