Results 1  10
of
395
The Nature of Statistical Learning Theory
, 1999
"... Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the deve ..."
Abstract

Cited by 12976 (32 self)
 Add to MetaCart
Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more
Shape Matching and Object Recognition Using Shape Contexts
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solv ing for correspondences between points on the two shapes, (2) using the correspondences to estimate an aligning transform ..."
Abstract

Cited by 1787 (21 self)
 Add to MetaCart
(Show Context)
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solv ing for correspondences between points on the two shapes, (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape con texts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; reg ularized thin plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning trans form. We treat recognition in a nearestneighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits and the COIL dataset.
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 366 (38 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Gaussian Processes for Regression
 Advances in Neural Information Processing Systems 8
, 1996
"... The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparame ..."
Abstract

Cited by 267 (21 self)
 Add to MetaCart
The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results.
An equivalence between sparse approximation and Support Vector Machines
 A.I. Memo 1606, MIT Arti cial Intelligence Laboratory
, 1997
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), ..."
Abstract

Cited by 248 (7 self)
 Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), and a sparse approximation scheme that resembles the Basis Pursuit DeNoising algorithm (Chen, 1995 � Chen, Donoho and Saunders, 1995). SVM is a technique which can be derived from the Structural Risk Minimization Principle (Vapnik, 1982) and can be used to estimate the parameters of several di erent approximation schemes, including Radial Basis Functions, algebraic/trigonometric polynomials, Bsplines, and some forms of Multilayer Perceptrons. Basis Pursuit DeNoising is a sparse approximation technique, in which a function is reconstructed by using a small number of basis functions chosen from a large set (the dictionary). We show that, if the data are noiseless, the modi ed version of Basis Pursuit DeNoising proposed in this paper is equivalent to SVM in the following sense: if applied to the same data set the two techniques give the same solution, which is obtained by solving the same quadratic programming problem. In the appendix we also present a derivation of the SVM technique in the framework of regularization theory, rather than statistical learning theory, establishing a connection between SVM, sparse approximation and regularization theory.
Sparse Greedy Matrix Approximation for Machine Learning
, 2000
"... In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the ..."
Abstract

Cited by 224 (10 self)
 Add to MetaCart
In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the design matrix. Experimental results are given and connections to KernelPCA, Sparse Kernel Feature Analysis, and Matching Pursuit are pointed out. 1. Introduction Many recent advances in machine learning such as Support Vector Machines [Vapnik, 1995], Regularization Networks [Girosi et al., 1995], or Gaussian Processes [Williams, 1998] are based on kernel methods. Given an msample f(x 1 ; y 1 ); : : : ; (x m ; y m )g of patterns x i 2 X and target values y i 2 Y these algorithms minimize the regularized risk functional min f2H R reg [f ] = 1 m m X i=1 c(x i ; y i ; f(x i )) + 2 kfk 2 H : (1) Here H denotes a reproducing kernel Hilbert space (RKHS) [Aronszajn, 1950],...
A Generalized Representer Theorem
 In Proceedings of the Annual Conference on Computational Learning Theory
, 2001
"... Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and ..."
Abstract

Cited by 220 (18 self)
 Add to MetaCart
(Show Context)
Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a selfcontained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.
Face Recognition From One Example View
, 1995
"... To create a poseinvariant face recognizer, one strategy is the viewbased approach, which uses a set of example views at different poses. But what if we only have one example view available, such as a scanned passport photo  can we still recognize faces under different poses? Given one example vi ..."
Abstract

Cited by 169 (5 self)
 Add to MetaCart
To create a poseinvariant face recognizer, one strategy is the viewbased approach, which uses a set of example views at different poses. But what if we only have one example view available, such as a scanned passport photo  can we still recognize faces under different poses? Given one example view at a known pose, it is still possible to use the viewbased approach by exploiting prior knowledge of faces to generate virtual views, or views of the face as seen from different poses. To represent prior knowledge, we use 2D example views of prototype faces under different rotations. We will develop examplebased techniques for applying the rotation seen in the prototypes to essentially "rotate" the single real view which is available. Next, the combined set of one real and multiple virtual views is used as example views in a viewbased, poseinvariant face recognizer. Our experiments suggest that for expressing prior knowledge of faces, 2D examplebased approaches should be considered ...
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 167 (18 self)
 Add to MetaCart
(Show Context)
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
Kernel partial least squares regression in reproducing kernel hilbert space
 Journal of Machine Learning Research
, 2001
"... A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the late ..."
Abstract

Cited by 151 (10 self)
 Add to MetaCart
A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the latent variables (components). However, in contrast to PCR, PLS creates the components by modeling the relationship between input and output variables while maintaining most of the information in the input variables. PLS is useful in situations where the number of explanatory variables exceeds the number of observations and/or a high level of multicollinearity among those variables is assumed. Motivated by this fact we will provide a kernel PLS algorithm for construction of nonlinear regression models in possibly highdimensional feature spaces. We give the theoretical description of the kernel PLS algorithm and we experimentally compare the algorithm with the existing kernel PCR and kernel ridge regression techniques. We will demonstrate that on the data sets employed kernel PLS achieves the same results as kernel PCR but uses significantly fewer, qualitatively different components. 1.