Results 1  10
of
73
Stability and Generalization
, 2001
"... We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classif ..."
Abstract

Cited by 167 (6 self)
 Add to MetaCart
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a realvalued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and KullbackLeibler regularization. We demonstrate how to apply the results to SVM for regression and classification.
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
The Thermodynamic Limit in Mean Field Spin Glass Models
 Commun. Math. Phys
, 2002
"... We present a simple strategy in order to show the existence and uniqueness of the infinite volume limit of thermodynamic quantities, for a large class of mean field disordered models, as for example the SherringtonKirkpatrick model, and the Derrida pspin model. The main argument is based on a smoo ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
We present a simple strategy in order to show the existence and uniqueness of the infinite volume limit of thermodynamic quantities, for a large class of mean field disordered models, as for example the SherringtonKirkpatrick model, and the Derrida pspin model. The main argument is based on a smooth interpolation between a large system, made of N spin sites, and two similar but independent subsystems, made of N1 and N2 sites, respectively, with N1 + N2 = N. The quenched average of the free energy turns out to be subadditive with respect to the size of the system. This gives immediately convergence of the free energy per site, in the infinite volume limit. Moreover, a simple argument, based on concentration of measure, gives the almost sure convergence, with respect to the external noise. Similar results hold also for the ground state energy per site.
Concentration of the Spectral Measure for Large Matrices
, 2000
"... We derive concentration inequalities for functions of the empirical measure of eigenvalues for large, random, self adjoint matrices, with not necessarily Gaussian entries. The results presented apply in particular to nonGaussian Wigner and Wishart matrices. We also provide concentration bounds for ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
We derive concentration inequalities for functions of the empirical measure of eigenvalues for large, random, self adjoint matrices, with not necessarily Gaussian entries. The results presented apply in particular to nonGaussian Wigner and Wishart matrices. We also provide concentration bounds for non commutative functionals of random matrices. 1 Introduction and statement of results Consider a random N N Hermitian matrix X with i.i.d. complex entries (except for the symmetry constraint) satisfying a moment condition. It is well known since Wigner [28] that the spectral measure of N 1=2 X converges to the semicircle law. This observation has been generalized to a large class of matrices, e.g. sample covariance matrices of the form XRX where R is a deterministic diagonal matrix ([19]), band matrices (see [5, 16, 20]), etc. For the Wigner case, this convergence has been supplemented by Central Limit Theorems, see [15] for the case of Gaussian entries and [17], [22] for the gen...
Moment Inequalities for Functions of Independent Random Variables
"... this paper is to provide such generalpurpose inequalities. Our approach is based on a generalization of Ledoux's entropy method (see [26, 28]). Ledoux's method relies on abstract functional inequalities known as logarithmic Sobolev inequalities and provide a powerful tool for deriving exponential i ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
this paper is to provide such generalpurpose inequalities. Our approach is based on a generalization of Ledoux's entropy method (see [26, 28]). Ledoux's method relies on abstract functional inequalities known as logarithmic Sobolev inequalities and provide a powerful tool for deriving exponential inequalities for functions of independent random variables, see Boucheron, Massart, and AMS 1991 subject classifications. Primary 60E15, 60C05, 28A35; Secondary 05C80 Key words and phrases. Moment inequalities, Concentration inequalities; Empirical processes; Random graphs Supported by EU Working Group RANDAPX, binational PROCOPE Grant 05923XL The work of the third author was supported by the Spanish Ministry of Science and Technology and FEDER, grant BMF200303324 Lugosi [6, 7], Bousquet [8], Devroye [14], Massart [30, 31], Rio [36] for various applications. To derive moment inequalities for general functions of independent random variables, we elaborate on the pioneering work of Latala and Oleszkiewicz [25] and describe socalled #Sobolev inequalities which interpolate between Poincare's inequality and logarithmic Sobolev inequalities (see also Beckner [4] and Bobkov's arguments in [26])
Rademacher Processes And Bounding The Risk Of Function Learning
 High Dimensional Probability II
, 1999
"... We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific propertie ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific properties of the function class. Using Talagrand's type concentration inequalities for empirical and Rademacher processes, we show that the bounds hold with high probability that decreases exponentially fast when the sample size grows. In typical situations that are frequently encountered in the theory of function learning, the bounds give nearly optimal rate of convergence of the risk to zero. 1. Local Rademacher norms and bounds on the risk: main results Let (S; A) be a measurable space and let F be a class of Ameasurable functions from S into [0; 1]: Denote P(S) the set of all probability measures on (S; A): Let f 0 2 F be an unknown target function. Given a probability measure P 2 P(S) (also unknown), let (X 1 ; : : : ; Xn ) be an i.i.d. sample in (S; A) with common distribution P (defined on a probability space(\Omega ; \Sigma; P)). In computer learning theory, the problem of estimating f 0 ; based on the labeled sample (X 1 ; Y 1 ); : : : ; (Xn ; Yn ); where Y j := f 0 (X j ); j = 1; : : : ; n; is referred to as function learning problem. The so called concept learning is a special case of function learning. In this case, F := fI C : C 2 Cg; where C ae A is called a class of concepts (see Vapnik (1998), Vidyasagar (1996), Devroye, Gyorfi and Lugosi (1996) for the account on statistical learning theory). The goal of function learning is to find an estimate
Nonasymptotic theory of random matrices: extreme singular values
 PROCEEDINGS OF THE INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2010
"... ..."