Results 1  10
of
237
Local Rademacher complexities
 Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract

Cited by 174 (21 self)
 Add to MetaCart
(Show Context)
We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Convexity, Classification, and Risk Bounds
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 01 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract

Cited by 167 (15 self)
 Add to MetaCart
(Show Context)
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 01 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 01 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 160 (11 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
(Show Context)
Concentration inequalities
 ADVANCED LECTURES IN MACHINE LEARNING
, 2004
"... Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis o ..."
Abstract

Cited by 89 (1 self)
 Add to MetaCart
(Show Context)
Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools.
Rademacher Processes And Bounding The Risk Of Function Learning
 High Dimensional Probability II
, 1999
"... We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific propertie ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
(Show Context)
We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific properties of the function class. Using Talagrand's type concentration inequalities for empirical and Rademacher processes, we show that the bounds hold with high probability that decreases exponentially fast when the sample size grows. In typical situations that are frequently encountered in the theory of function learning, the bounds give nearly optimal rate of convergence of the risk to zero. 1. Local Rademacher norms and bounds on the risk: main results Let (S; A) be a measurable space and let F be a class of Ameasurable functions from S into [0; 1]: Denote P(S) the set of all probability measures on (S; A): Let f 0 2 F be an unknown target function. Given a probability measure P 2 P(S) (also unknown), let (X 1 ; : : : ; Xn ) be an i.i.d. sample in (S; A) with common distribution P (defined on a probability space(\Omega ; \Sigma; P)). In computer learning theory, the problem of estimating f 0 ; based on the labeled sample (X 1 ; Y 1 ); : : : ; (Xn ; Yn ); where Y j := f 0 (X j ); j = 1; : : : ; n; is referred to as function learning problem. The so called concept learning is a special case of function learning. In this case, F := fI C : C 2 Cg; where C ae A is called a class of concepts (see Vapnik (1998), Vidyasagar (1996), Devroye, Gyorfi and Lugosi (1996) for the account on statistical learning theory). The goal of function learning is to find an estimate
Linearly Parameterized Bandits
, 2008
"... We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an rdimensional random vector Z ∈ Rr, where r ≥ 2. The objective is to choose a sequence of arms to minimize the cumulative regret and Bayes risk. W ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an rdimensional random vector Z ∈ Rr, where r ≥ 2. The objective is to choose a sequence of arms to minimize the cumulative regret and Bayes risk. We propose a policy based on least squares estimation and uncertainty ellipsoids, which generalizes the upper confidence index approach pioneered by Lai and Robbins (1985). The cumulative regret and Bayes risk under our proposed policy admits an upper bound of the form r √ T log 3/2 T, which is linear in the dimension r, and independent of the number of arms. We also establish Ω(r √ T) lower bounds on the regret and risk, showing that our proposed policy is nearly optimal.
Empirical graph Laplacian approximation of LaplaceBeltrami operators: Large sample results
, 2006
"... Let M be a compact Riemannian submanifold of R m of dimension d and let X1,..., Xn be a sample of i.i.d. points in M with uniform distribution. We study the random operators ∆hn,nf(p):= 1 nh d+2 ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
(Show Context)
Let M be a compact Riemannian submanifold of R m of dimension d and let X1,..., Xn be a sample of i.i.d. points in M with uniform distribution. We study the random operators ∆hn,nf(p):= 1 nh d+2