Results 1  10
of
72
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
 Machine Learning
, 1994
"... In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract

Cited by 109 (12 self)
 Add to MetaCart
In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1 Introduction Consider a simple concept learning model in which the learner attempts to infer an unknown target concept f , chosen from a known concept class F of f0; 1gvalued functions over an instance space X....
Local Rademacher complexities
 Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract

Cited by 107 (18 self)
 Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Learning nearoptimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
 MACHINE LEARNING JOURNAL (2008) 71:89129
, 2008
"... ..."
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
A few notes on Statistical Learning Theory
, 2003
"... this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interest ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interesting (an opinion which is not necessarily shared by the majority of the learning community). Relevant books which present a more evenly balanced approach are, for example [1, 4, 35, 36] The starting point of our discussion is the formulation of the learning problem. Consider a class G, consisting of real valued functions defined on a space #, and assume that each g G maps # into [0, 1]. Let T be an unknown function, T : # [0, 1] and set to be an unknown probability measure on #
Geometric Range Searching
, 1994
"... In geometric range searching, algorithmic problems of the following type are considered: Given an npoint set P in the plane, build a data structure so that, given a query triangle R, the number of points of P lying in R can be determined quickly. Problems of this type are of crucial importance in c ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
In geometric range searching, algorithmic problems of the following type are considered: Given an npoint set P in the plane, build a data structure so that, given a query triangle R, the number of points of P lying in R can be determined quickly. Problems of this type are of crucial importance in computational geometry, as they can be used as subroutines in many seemingly unrelated algorithms. We present a survey of results and main techniques in this area.
Risk bounds for Statistical Learning
"... We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weig ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a class of classi…ers than entropy with bracketing as in Tsybakov’s work. In particular we derive new risk bounds for the ERM when the classi…cation rules belong to some VCclass under margin conditions and discuss the optimality of those bounds in a minimax sense.
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
Improved Bounds on the Sample Complexity of Learning
 Journal of Computer and System Sciences
, 2000
"... We present two improved bounds on the sample complexity of learning. First, we present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relativ ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
We present two improved bounds on the sample complexity of learning. First, we present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model. Next, we prove a lower bound on the sample complexity for learning according to the prediction model that is optimal to within a factor of 1+o(1). 1 Introduction Many important applied problems can be modeled as learning from random examples. Examples include text categorization [21], handwritten character recognition [15, 4, 26], speech recognition [2, 1], and virtual circuit holding times in IPoverATM networks [20, 17, 14]. In this paper, we p...
Nonparametric time series prediction through adaptive model selection
 Machine Learning
, 2000
"... Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and ada ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and adapt the uniform convergence framework of Vapnik and Chervonenkis to the problem of time series prediction, obtaining finite sample bounds. Furthermore, by allowing both the model complexity and memory size to be adaptively determined by the data, we derive nonparametric rates of convergence through an extension of the method of structural risk minimization suggested by Vapnik. All our results are derived for general L p error measures, and apply to both exponentially and algebraically mixing processes.