Results 1 - 10
of
441
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract
-
Cited by 1511 (112 self)
- Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Probabilistic Visual Learning for Object Representation
, 1996
"... We present an unsupervised technique for visual learning which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a Mixture-of ..."
Abstract
-
Cited by 476 (13 self)
- Add to MetaCart
We present an unsupervised technique for visual learning which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a Mixture-of-Gaussians model (for multimodal distributions). These probability densities are then used to formulate a maximum-likelihood estimation framework for visual search and target detection for automatic object recognition and coding. Our learning technique is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and non-rigid objects such as hands.
Estimating the Support of a High-Dimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a metho ..."
Abstract
-
Cited by 382 (30 self)
- Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
- Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract
-
Cited by 215 (28 self)
- Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Gaussian Processes for Regression
- Advances in Neural Information Processing Systems 8
, 1996
"... The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparame ..."
Abstract
-
Cited by 181 (17 self)
- Add to MetaCart
The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results.
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
- Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract
-
Cited by 160 (4 self)
- Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of non-linear statistical techniques that can be used for...
Prior Learning and Gibbs Reaction-Diffusion
, 1997
"... This article addresses two important themes in early visual computation: rst it presents a novel theory for learning the universal statistics of natural images { a prior model for typical cluttered scenes of the world { from a set of natural images, second it proposes a general framework of designi ..."
Abstract
-
Cited by 126 (16 self)
- Add to MetaCart
This article addresses two important themes in early visual computation: rst it presents a novel theory for learning the universal statistics of natural images { a prior model for typical cluttered scenes of the world { from a set of natural images, second it proposes a general framework of designing reaction-diusion equations for image processing. We start by studying the statistics of natural images including the scale invariant properties, then generic prior models were learned to duplicate the observed statistics, based on the minimax entropy theory studied in two previous papers. The resulting Gibbs distributions have potentials of the form U(I; ; S) = P K I)(x; y)) with S = fF g being a set of lters and = f the potential functions. The learned Gibbs distributions con rm and improve the form of existing prior models such as line-process, but in contrast to all previous models, inverted potentials (i.e. (x) decreasing as a function of jxj) were found to be necessary. We nd that the partial dierential equations given by gradient descent on U(I; ; S) are essentially reaction-diusion equations, where the usual energy terms produce anisotropic diusion while the inverted energy terms produce reaction associated with pattern formation, enhancing preferred image features. We illustrate how these models can be used for texture pattern rendering, denoising, image enhancement and clutter removal by careful choice of both prior and data models of this type, incorporating the appropriate features. Song Chun Zhu is now with the Computer Science Department, Stanford University, Stanford, CA 94305, and David Mumford is with the Division of Applied Mathematics, Brown University, Providence, RI 02912. This work started when the authors were at ...
Feature-Based Methods For Large Scale Dynamic Programming
- Machine Learning
, 1994
"... We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be Combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations, that ..."
Abstract
-
Cited by 119 (6 self)
- Add to MetaCart
We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be Combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations, that is, representations that involve an arbitrarily complex feature extraction stage and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. We also apply one of these algorithms to pro- duce a computer program that plays Tetris at a respectable skill level. Furthermore, we provide a counterexample illustrating the difficulties of integrating compact representations and dynamic programming: which exemplifies the shortcomings of several methods in current practice, including Q-learning and temporal-difference learning.
Orientation histograms for hand gesture recognition
- Mitsubishi Electric Research Labs., 201
, 213
"... We present a method to recognize hand gestures, based on a pattern recognition technique developed by McConnell [16] employing histograms of local orientation. We use the orientation histogram as a feature vector for gesture classification and interpolation. For moving or ¨dynamic gestures ¨ , the h ..."
Abstract
-
Cited by 116 (2 self)
- Add to MetaCart
We present a method to recognize hand gestures, based on a pattern recognition technique developed by McConnell [16] employing histograms of local orientation. We use the orientation histogram as a feature vector for gesture classification and interpolation. For moving or ¨dynamic gestures ¨ , the histogram of the spatio-temporal gradients of image intensity form the analogous feature vector and may be useful for dynamic gesture recognition.

