Results 1  10
of
16
The Nature of Statistical Learning Theory
, 1995
"... Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on ..."
Abstract

Cited by 9946 (28 self)
 Add to MetaCart
Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 72 (18 self)
 Add to MetaCart
(Show Context)
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Quantitative stability in stochastic programming: The method of probability metrics
, 2000
"... Quantitative stability of optimal values and solution sets to stochastic programming problems is studied when the underlying probability distribution varies in some metric space of probability measures. We give conditions that imply that a stochastic program behaves stable with respect to a minim ..."
Abstract

Cited by 30 (13 self)
 Add to MetaCart
Quantitative stability of optimal values and solution sets to stochastic programming problems is studied when the underlying probability distribution varies in some metric space of probability measures. We give conditions that imply that a stochastic program behaves stable with respect to a minimal information (m.i.) probability metric that is naturally associated with the data of the program. Canonical metrics bounding the m.i. metric are derived for specic models, namely for linear twostage, mixedinteger twostage and chance constrained models. The corresponding quantitative stability results as well as some consequences for asymptotic properties of empirical approximations extend earlier results in this direction. In particular, rates of convergence in probability are derived under metric entropy conditions. Finally, we study stability properties of stable investment portfolios having minimal risk with respect to the spectral measure and stability index of the underly...
Entropy and the combinatorial dimension
 Inventiones Mathematicae
, 2003
"... We solve Talagrand’s entropy problem: the L2covering numbers of every uniformly bounded class of functions are exponential in its shattering dimension. This extends Dudley’s theorem on classes of {0,1}valued functions, for which the shattering dimension is the VapnikChervonenkis dimension. In conv ..."
Abstract

Cited by 22 (13 self)
 Add to MetaCart
We solve Talagrand’s entropy problem: the L2covering numbers of every uniformly bounded class of functions are exponential in its shattering dimension. This extends Dudley’s theorem on classes of {0,1}valued functions, for which the shattering dimension is the VapnikChervonenkis dimension. In convex geometry, the solution means that the entropy of a convex body K is controlled by the maximal dimension of a cube of a fixed side contained in the coordinate projections of K. This has a number of consequences, including the optimal Elton’s Theorem and estimates on the uniform central limit theorem in the real valued case. 1
Preservation theorems for GlivenkoCantelli and uniform GlivenkoCantelli classes
 134 In High Dimensional Probability II, Evarist Giné
, 2000
"... ABSTRACT We show that the P −Glivenko property of classes of functions F1,...,Fk is preserved by a continuous function ϕ from R k to R in the sense that the new class of functions x → ϕ(f1(x),...,fk(x)), fi ∈Fi, i =1,...,k is again a GlivenkoCantelli class of functions if it has an integrable envel ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
ABSTRACT We show that the P −Glivenko property of classes of functions F1,...,Fk is preserved by a continuous function ϕ from R k to R in the sense that the new class of functions x → ϕ(f1(x),...,fk(x)), fi ∈Fi, i =1,...,k is again a GlivenkoCantelli class of functions if it has an integrable envelope. We also prove an analogous result for preservation of the uniform GlivenkoCantelli property. Corollaries of the main theorem include two preservation theorems of Dudley (1998). We apply the main result to reprove a theorem of Schick and Dudley 1998a or b? Yu (1999)concerning consistency of the NPMLE in a model for “mixed case” interval censoring. Finally a version of the consistency result of Schick and Yu (1999)is established for a general model for “mixed case interval censoring ” in which a general sample space Y is partitioned into sets which are members of some VCclass C of subsets of Y. 1 GlivenkoCantelli theorems Let (X, A,P) be a probability space, and suppose that F ⊂ L1(P). For
Combinatorics of random processes and sections of convex bodies, preprint available at ArXiV http://front.math.ucdavis.edu, Banach Space Bulletin http://www.math.okstate.edu/~alspach/banach and our webpages, http://www.math.ucdavis.edu/~vershynin and http
"... We find a sharp combinatorial bound for the metric entropy of sets in R n and general classes of functions. This solves two basic combinatorial conjectures on the empirical processes. 1. A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimensio ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We find a sharp combinatorial bound for the metric entropy of sets in R n and general classes of functions. This solves two basic combinatorial conjectures on the empirical processes. 1. A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimension is integrable. 2. The uniform entropy is equivalent to the combinatorial dimension under minimal regularity. Our method also constructs a nicely bounded coordinate section of a symmetric convex body in R n. In the operator theory, this essentially proves for all normed spaces the restricted invertibility principle of Bourgain and Tzafriri. 1
From Uniform Laws of Large Numbers to Uniform Ergodic Theorems
"... The purpose of these lectures is to present three different approaches with their own methods for establishing uniform laws of large numbers and uniform ergodic theorems for dynamical systems. The presentation follows the principle according to which the i.i.d. case is considered first in great deta ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
The purpose of these lectures is to present three different approaches with their own methods for establishing uniform laws of large numbers and uniform ergodic theorems for dynamical systems. The presentation follows the principle according to which the i.i.d. case is considered first in great detail, and then attempts are made to extend these results to the case of more general dependence structures. The lectures begin (Chapter 1) with a review and description of classic laws of large numbers and ergodic theorems, their connection and interplay, and their infinite dimensional extensions towards uniform theorems with applications to dynamical systems. The first approach (Chapter 2) is of metric entropy with bracketing which relies upon the BlumDeHardt law of large numbers and HoffmannJørgensen’s extension of it. The result extends to general dynamical systems using the uniform ergodic lemma (or Kingman’s subadditive ergodic theorem). In this context metric entropy and majorizing measure type conditions are also considered. The second approach (Chapter 3) is of Vapnik and Chervonenkis. It relies
Stochastic Integer Programming: Limit Theorems and Confidence Intervals
"... informs doi 10.1287/moor.1060.0222 ..."
Some facts about functionals of location and scatter
, 2006
"... Abstract: Assumptions on a likelihood function, including a local GlivenkoCantelli condition, imply the existence of Mestimators converging to an Mfunctional. Scatter matrixvalued estimators, defined on all empirical measures on R d for d ≥ 2, and equivariant under all, including singular, affin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract: Assumptions on a likelihood function, including a local GlivenkoCantelli condition, imply the existence of Mestimators converging to an Mfunctional. Scatter matrixvalued estimators, defined on all empirical measures on R d for d ≥ 2, and equivariant under all, including singular, affine transformations, are shown to be constants times the sample covariance matrix. So, if weakly continuous, they must be identically 0. Results are stated on existence and differentiability of location and scatter functionals, defined on a weakly dense, weakly open set of laws, via elliptically symmetric t distributions on R d, following up on work of Kent, Tyler, and Dümbgen. 1.