• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Rademacher penalties and structural risk minimization (2001)

by V Koltchinskii
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 46
Next 10 →

Optimal Aggregation of Classifiers in Statistical Learning

by Alexandre B. Tsybakov , 2001
"... The problem of statistical learning can be considered as a problem of nonparametric estimation of sets, where the risk is de ned by means of a speci c distance function between sets associated to the misclassi cation error. The rates of convergence of classi ers depend on two parameters: the ..."
Abstract - Cited by 100 (4 self) - Add to MetaCart
The problem of statistical learning can be considered as a problem of nonparametric estimation of sets, where the risk is de ned by means of a speci c distance function between sets associated to the misclassi cation error. The rates of convergence of classi ers depend on two parameters: the complexity of the class of candidate sets and the "margin" parameter. The dependence is explicitly given, in particular the optimal rates up to O(n ) can be attained, where n is the sample size, and the proposed classi ers have the property of robustness to the margin. The main result of the paper concerns optimal aggregation of classi ers: we suggest a classi er that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor.

Empirical margin distributions and bounding the generalization error of combined classifiers

by V. Koltchinskii, D. Panchenko - Ann. Statist , 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract - Cited by 90 (9 self) - Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1-norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.

Local Rademacher complexities

by Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson - Annals of Statistics , 2002
"... We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract - Cited by 76 (17 self) - Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.

Model Selection and Error Estimation

by Peter L. Bartlett, Stephane Boucheron , Gabor Lugosi , 2001
"... ..."
Abstract - Cited by 59 (16 self) - Add to MetaCart
Abstract not found

Mechanism Design via Machine Learning

by Maria-Florina Balcan, Avrim Blum, Jason D. Hartline, Yishay Mansour - IN PROC. OF THE 46TH IEEE SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE , 2005
"... We use techniques from sample-complexity in machine learning to reduce problems of incentive-compatible mechanism design to standard algorithmic questions, for a broad class of revenue-maximizing pricing problems. Our reductions imply that for these problems, given an optimal (or #-approximation) al ..."
Abstract - Cited by 39 (10 self) - Add to MetaCart
We use techniques from sample-complexity in machine learning to reduce problems of incentive-compatible mechanism design to standard algorithmic questions, for a broad class of revenue-maximizing pricing problems. Our reductions imply that for these problems, given an optimal (or #-approximation) algorithm for the standard algorithmic problem, we can convert it into a (1 + #)-approximation (or #(1 + #)-approximation) for the incentive-compatible mechanism design problem, so long as the number of bidders is sufficiently large as a function of an appropriate measure of complexity of the comparison class of solutions. We apply these results to the problem of auctioning a digital good, to the attribute auction problem which includes a wide variety of discriminatory pricing problems, and to the problem of item-pricing in unlimited-supply combinatorial auctions. From a machine learning perspective, these settings present several challenges: in particular, the loss function is discontinuous and asymmetric, and the range of bidders' valuations may be large.

Rademacher Processes And Bounding The Risk Of Function Learning

by V. Koltchinskii, D. Panchenko - High Dimensional Probability II , 1999
"... We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific propertie ..."
Abstract - Cited by 35 (7 self) - Add to MetaCart
We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific properties of the function class. Using Talagrand's type concentration inequalities for empirical and Rademacher processes, we show that the bounds hold with high probability that decreases exponentially fast when the sample size grows. In typical situations that are frequently encountered in the theory of function learning, the bounds give nearly optimal rate of convergence of the risk to zero. 1. Local Rademacher norms and bounds on the risk: main results Let (S; A) be a measurable space and let F be a class of A-measurable functions from S into [0; 1]: Denote P(S) the set of all probability measures on (S; A): Let f 0 2 F be an unknown target function. Given a probability measure P 2 P(S) (also unknown), let (X 1 ; : : : ; Xn ) be an i.i.d. sample in (S; A) with common distribution P (defined on a probability space(\Omega ; \Sigma; P)). In computer learning theory, the problem of estimating f 0 ; based on the labeled sample (X 1 ; Y 1 ); : : : ; (Xn ; Yn ); where Y j := f 0 (X j ); j = 1; : : : ; n; is referred to as function learning problem. The so called concept learning is a special case of function learning. In this case, F := fI C : C 2 Cg; where C ae A is called a class of concepts (see Vapnik (1998), Vidyasagar (1996), Devroye, Gyorfi and Lugosi (1996) for the account on statistical learning theory). The goal of function learning is to find an estimate

A Hilbert space embedding for distributions

by Alex Smola, Arthur Gretton, Le Song, Bernhard Schölkopf - In Algorithmic Learning Theory: 18th International Conference , 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for ..."
Abstract - Cited by 27 (15 self) - Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or Kullback-Leibler divergence, we require sophisticated space partitioning and/or

Moment Inequalities for Functions of Independent Random Variables

by Stéphane Boucheron, Olivier Bousquet, Gabor Lugosi, Bousquet Gábor Lugosi, Pascal Massart, Cnrs-université Paris-sud Max Planck, Université Paris-sud
"... this paper is to provide such general-purpose inequalities. Our approach is based on a generalization of Ledoux's entropy method (see [26, 28]). Ledoux's method relies on abstract functional inequalities known as logarithmic Sobolev inequalities and provide a powerful tool for deriving exponential i ..."
Abstract - Cited by 26 (8 self) - Add to MetaCart
this paper is to provide such general-purpose inequalities. Our approach is based on a generalization of Ledoux's entropy method (see [26, 28]). Ledoux's method relies on abstract functional inequalities known as logarithmic Sobolev inequalities and provide a powerful tool for deriving exponential inequalities for functions of independent random variables, see Boucheron, Massart, and AMS 1991 subject classifications. Primary 60E15, 60C05, 28A35; Secondary 05C80 Key words and phrases. Moment inequalities, Concentration inequalities; Empirical processes; Random graphs Supported by EU Working Group RAND-APX, binational PROCOPE Grant 05923XL The work of the third author was supported by the Spanish Ministry of Science and Technology and FEDER, grant BMF2003-03324 Lugosi [6, 7], Bousquet [8], Devroye [14], Massart [30, 31], Rio [36] for various applications. To derive moment inequalities for general functions of independent random variables, we elaborate on the pioneering work of Latala and Oleszkiewicz [25] and describe so-called #-Sobolev inequalities which interpolate between Poincare's inequality and logarithmic Sobolev inequalities (see also Beckner [4] and Bobkov's arguments in [26])

Learning from multiple sources

by Koby Crammer, Michael Kearns, Jennifer Wortman, Peter Bartlett - In Advances in Neural Information Processing Systems 19 , 2007
"... We consider the problem of learning accurate models from multiple sources of “nearby ” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This ..."
Abstract - Cited by 26 (3 self) - Add to MetaCart
We consider the problem of learning accurate models from multiple sources of “nearby ” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations.

Concentration inequalities

by Stéphane Boucheron, Gábor Lugosi, Olivier Bousquet - Advanced Lectures in Machine Learning , 2004
"... Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical a ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University