Results 1 -
8 of
8
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
- Machine Learning
, 1994
"... In this paper we study a Bayesian or average-case model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract
-
Cited by 98 (12 self)
- Add to MetaCart
In this paper we study a Bayesian or average-case model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1 Introduction Consider a simple concept learning model in which the learner attempts to infer an unknown target concept f , chosen from a known concept class F of f0; 1g-valued functions over an instance space X....
Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension
, 1992
"... : Let V ` f0; 1g n have Vapnik-Chervonenkis dimension d. Let M(k=n;V ) denote the cardinality of the largest W ` V such that any two distinct vectors in W differ on at least k indices. We show that M(k=n;V ) (cn=(k + d)) d for some constant c. This improves on the previous best result of ((cn ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
: Let V ` f0; 1g n have Vapnik-Chervonenkis dimension d. Let M(k=n;V ) denote the cardinality of the largest W ` V such that any two distinct vectors in W differ on at least k indices. We show that M(k=n;V ) (cn=(k + d)) d for some constant c. This improves on the previous best result of ((cn=k) log(n=k)) d . This new result has applications in the theory of empirical processes. 1 The author gratefully acknowledges the support of the Mathematical Sciences Research Institute at UC Berkeley and ONR grant N00014-91-J-1162. 1 1 Statement of Results Let n be natural number greater than zero. Let V ` f0; 1g n . For a sequence of indices I = (i 1 ; . . . ; i k ), with 1 i j n, let V j I denote the projection of V onto I, i.e. V j I = f(v i 1 ; . . . ; v i k ) : (v 1 ; . . . ; v n ) 2 V g: If V j I = f0; 1g k then we say that V shatters the index sequence I. The Vapnik-Chervonenkis dimension of V is the size of the longest index sequence I that is shattered by V [VC71] (t...
Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes
, 1999
"... We show that the P Glivenko property of classes of functions F1 ; : : : ; Fk is preserved by a continuous function ' from R to R in the sense that the new class of functions x ! '(f1 (x); : : : ; fk (x)); f i 2 F i ; i = 1; : : : ; k is again a Glivenko-Cantelli class of functions if it has a ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We show that the P Glivenko property of classes of functions F1 ; : : : ; Fk is preserved by a continuous function ' from R to R in the sense that the new class of functions x ! '(f1 (x); : : : ; fk (x)); f i 2 F i ; i = 1; : : : ; k is again a Glivenko-Cantelli class of functions if it has an integrable envelope.
Strong Minimax Lower Bounds for Learning
, 1998
"... Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g_n, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g_n is at least a constant times V/n, where V is the VC dimension of the conc ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g_n, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g_n is at least a constant times V/n, where V is the VC dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair. In this paper we investigate minimax lower bounds in such a--stronger--sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {g_n}, there exists a fixed distribution of X and a fixed concept C such that the expected error is larger than a constant times k/n for infinitely many n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds.
Neural Networks with Local Receptive Fields and Superlinear VC Dimension
- Neural Computation
, 2002
"... Local receptive field neurons comprise such well-known and widely used unit types as radial basis function neurons and neurons with center-surround receptive field. We study the Vapnik-Chervonenkis (VC) dimension of feedforward neural networks with one hidden layer of these units. For several varian ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Local receptive field neurons comprise such well-known and widely used unit types as radial basis function neurons and neurons with center-surround receptive field. We study the Vapnik-Chervonenkis (VC) dimension of feedforward neural networks with one hidden layer of these units. For several variants of local receptive field neurons we show that the VC dimension of these networks is superlinear.
The discrepancy method in computational geometry
- In Handbook of Discrete and Computational Geometry
, 2004
"... Discrepancy theory investigates how uniform nonrandom structures can be. For example, given n points in the plane, how should we color them red and blue so as to minimize the difference between the number of red points and the number of blue ones within any disk? Or, how should we place n points in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Discrepancy theory investigates how uniform nonrandom structures can be. For example, given n points in the plane, how should we color them red and blue so as to minimize the difference between the number of red points and the number of blue ones within any disk? Or, how should we place n points in the unit square
Metric Entropy and Minimax Risk in Classification
- In Lecture Notes in Comp. Sci.: Studies in Logic and
, 1997
"... . We apply recent results on the minimax risk in density estimation to the related problem of pattern classification. The notion of loss we seek to minimize is an information theoretic measure of how well we can predict the classification of future examples, given the classification of previously se ..."
Abstract
- Add to MetaCart
. We apply recent results on the minimax risk in density estimation to the related problem of pattern classification. The notion of loss we seek to minimize is an information theoretic measure of how well we can predict the classification of future examples, given the classification of previously seen examples. We give an asymptotic characterization of the minimax risk in terms of the metric entropy properties of the class of distributions that might be generating the examples. We then use these results to characterize the minimax risk in the special case of noisy twovalued classification problems in terms of the Assouad density and the Vapnik-Chervonenkis dimension. 1 Introduction The most basic problem in pattern recognition is the problem of classifying instances consisting of vectors of measurements into a one of a finite number of types or classes. One standard example is the recognition of isolated capital characters, in which the instances are measurements on images of letters ...
Learning Using Information Theory and the VC Dimension
"... Abstract. In this paper we study a Bayesian or average-case model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances see ..."
Abstract
- Add to MetaCart
Abstract. In this paper we study a Bayesian or average-case model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models.

