Results 1  10
of
53
The Nature of Statistical Learning Theory
, 1995
"... Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on ..."
Abstract

Cited by 8950 (28 self)
 Add to MetaCart
Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Scalesensitive Dimensions, Uniform Convergence, and Learnability
, 1997
"... Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distributionfree convergence property of means to expectations uniformly over classes of random variables. Classes of realvalued functions enjoy ..."
Abstract

Cited by 208 (1 self)
 Add to MetaCart
Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distributionfree convergence property of means to expectations uniformly over classes of random variables. Classes of realvalued functions enjoying such a property are also known as uniform GlivenkoCantelli classes. In this paper we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform GlivenkoCantelli classes. Our characterization yields Dudley, Gin'e, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the VapnikChervonenkis dimension. We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or "agnostic") framework. Furthermore, we show a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire. These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.
A Bayesian/information theoretic model of learning to learn via multiple task sampling
 Machine Learning
, 1997
"... Abstract. A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the conce ..."
Abstract

Cited by 79 (2 self)
 Add to MetaCart
Abstract. A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the concept of an objective prior distribution. It is argued that for many common machine learning problems, although in general we do not know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by learning sufficiently many tasks from the environment. In addition, bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, but the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous. The theory is applied to the problem of learning a common feature set or equivalently a lowdimensionalrepresentation (LDR) for an environment of related tasks.
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
On Randomized OneRound Communication Complexity
 Computational Complexity
, 1995
"... We present several results regarding randomized oneround communication complexity. Our results include a connection to the VCdimension, a study of the problem of computing the inner product of two real valued vectors, and a relation between \simultaneous" protocols and oneround protocols. Key wor ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
We present several results regarding randomized oneround communication complexity. Our results include a connection to the VCdimension, a study of the problem of computing the inner product of two real valued vectors, and a relation between \simultaneous" protocols and oneround protocols. Key words. Communication Complexity; Oneround and simultaneous protocols; VCdimension; Subject classications. 68Q25. 1.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Smooth Boosting and Learning with Malicious Noise
 Journal of Machine Learning Research
, 2003
"... We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In pa ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In particular, we use the new smooth boosting algorithm to construct malicious noise tolerant versions of the PACmodel pnorm linear threshold learning algorithms described in [23]. The bounds on sample complexity and malicious noise tolerance of these new PAC algorithms closely correspond to known bounds for the online p...
Function Learning from Interpolation
, 1998
"... In this paper, we study a statistical property of classes of realvalued functions that we call approximation from interpolated examples. We derive a characterisation of function classes that have this property, in terms of their `fatshattering function', a notion that has proven useful in comput ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
In this paper, we study a statistical property of classes of realvalued functions that we call approximation from interpolated examples. We derive a characterisation of function classes that have this property, in terms of their `fatshattering function', a notion that has proven useful in computational learning theory. The property is central to a problem of learning realvalued functions from random examples in which we require satisfactory performance from every algorithm that returns a function which approximately interpolates the training examples.
Nonparametric time series prediction through adaptive model selection
 Machine Learning
, 2000
"... Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and ada ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and adapt the uniform convergence framework of Vapnik and Chervonenkis to the problem of time series prediction, obtaining finite sample bounds. Furthermore, by allowing both the model complexity and memory size to be adaptively determined by the data, we derive nonparametric rates of convergence through an extension of the method of structural risk minimization suggested by Vapnik. All our results are derived for general L p error measures, and apply to both exponentially and algebraically mixing processes.