Results 1  10
of
80
The Kernel Recursive Least Squares Algorithm
 IEEE Transactions on Signal Processing
, 2003
"... We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Spars ..."
Abstract

Cited by 117 (2 self)
 Add to MetaCart
(Show Context)
We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate online, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a datadependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof theart Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional leastsquares methods is commonplace: Time series prediction and channel equalization.
Building Support Vector Machines with Reduced Classifier Complexity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract

Cited by 82 (1 self)
 Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Ranking a Random Feature For Variable And Feature Selection
 JOURNAL OF MACHINE LEARNING RESEARCH 3 (2003) 13991414
, 2003
"... We describe a feature selection method that can be applied directly to models that are linear with respect to their parameters, and indirectly to others. It is independent of the target machine. It is closely related to classical statistical hypothesis tests, but it is more intuitive, hence more s ..."
Abstract

Cited by 57 (9 self)
 Add to MetaCart
We describe a feature selection method that can be applied directly to models that are linear with respect to their parameters, and indirectly to others. It is independent of the target machine. It is closely related to classical statistical hypothesis tests, but it is more intuitive, hence more suitable for use by engineers who are not statistics experts. Furthermore, some assumptions of classical tests are relaxed. The method has been used successfully in a number of applications that are briefly described.
Twin Gaussian Processes for Structured Prediction
, 2010
"... ... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the KullbackLeibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examp ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the KullbackLeibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no handcrafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.
Gaussian Processes  Iterative Sparse Approximations
, 2002
"... This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. ASTON UNIVERSITY ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. ASTON UNIVERSITY
Machine learning methods for predicting failures in hard drives: A multiple instance application
, 2005
"... We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop a new algorithm based on the multipleinstance learning framework and the naive Bayesian classifier (miNB) which is specifically designed for the low falsealarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and nonparametric statistical tests (ranksum and reverse arrangements). The failureprediction performance of the SVM, ranksum and miNB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of ranksum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonlyused normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.
Sparse Kernel SVMs via CuttingPlane Training
"... Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speedingup prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for highdimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation. 1
Fast algorithms for large scale conditional 3D prediction
 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
, 2008
"... The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates th ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates that sparse conditional Bayesian Mixture of Experts (cMoE) models (e.g. BME [21]) are adequate modeling tools that not only provide contextual 3D predictions for problems like human pose reconstruction, but can also represent multiple interpretations that result from depth ambiguities or occlusion. However, training conditional predictors requires sophisticated doubleloop algorithms that scale unfavorably with the input dimension and the training set size, thus limiting their usage to 10,000 examples of less, so far. In this paper we present largescale algorithms, referred to as f BME, that combine forward feature selection and bound optimization in order to train probabilistic, BME models, with one order of magnitude more data (100,000 examples and up) and more than one order of magnitude faster. We present several large scale experiments, including monocular evaluation on the HumanEva dataset [19], demonstrating how the proposed methods overcome the scaling limitations of existing ones. 1.
Online Manifold Regularization: A New Learning Setting and Empirical Study
"... Abstract. We consider a novel “online semisupervised learning ” setting where (mostly unlabeled) data arrives sequentially in large volume, and it is impractical to store it all before learning. We propose an online manifold regularization algorithm. It differs from standard online learning in that ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a novel “online semisupervised learning ” setting where (mostly unlabeled) data arrives sequentially in large volume, and it is impractical to store it all before learning. We propose an online manifold regularization algorithm. It differs from standard online learning in that it learns even when the input point is unlabeled. Our algorithm is based on convex programming in kernel space with stochastic gradient descent, and inherits the theoretical guarantees of standard online algorithms. However, naïve implementation of our algorithm does not scale well. This paper focuses on efficient, practical approximations; we discuss two sparse approximations using buffering and online random projection trees. Experiments show our algorithm achieves risk and generalization accuracy comparable to standard batch manifold regularization, while each step runs quickly. Our online semisupervised learning setting is an interesting direction for further theoretical development, paving the way for semisupervised learning to work on realworld lifelong learning tasks. 1