Results 1  10
of
166
The Nature of Statistical Learning Theory
, 1995
"... Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on ..."
Abstract

Cited by 8950 (28 self)
 Add to MetaCart
Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 177 (15 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Stability and Generalization
, 2001
"... We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classif ..."
Abstract

Cited by 167 (6 self)
 Add to MetaCart
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a realvalued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and KullbackLeibler regularization. We demonstrate how to apply the results to SVM for regression and classification.
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 103 (15 self)
 Add to MetaCart
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
Fatshattering and the learnability of realvalued functions
 Journal of Computer and System Sciences
, 1996
"... We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of th ..."
Abstract

Cited by 61 (10 self)
 Add to MetaCart
We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of the VapnikChervonenkis dimension, the fatshattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if and only if its fatshattering function is finite. With different (also quite mild) restrictions, satisfied for example by gaussian noise, we show that a function class is learnable from polynomially many examples if and only if its fatshattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class. 1
On Randomized OneRound Communication Complexity
 Computational Complexity
, 1995
"... We present several results regarding randomized oneround communication complexity. Our results include a connection to the VCdimension, a study of the problem of computing the inner product of two real valued vectors, and a relation between \simultaneous" protocols and oneround protocols. Key wor ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
We present several results regarding randomized oneround communication complexity. Our results include a connection to the VCdimension, a study of the problem of computing the inner product of two real valued vectors, and a relation between \simultaneous" protocols and oneround protocols. Key words. Communication Complexity; Oneround and simultaneous protocols; VCdimension; Subject classications. 68Q25. 1.
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."