Results 1 -
3 of
3
Rigorous learning curve bounds from statistical mechanics
- Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the Vapnik-Chervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problem-specific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distribution-independent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
Annealed Theories of Learning
- In J.-H
, 1995
"... We study annealed theories of learning boolean functions using a concept class of finite cardinality. The naive annealed theory can be used to derive a universal learning curve bound for zero temperature learning, similar to the inverse square root bound from the Vapnik-Chervonenkis theory. Tighter, ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We study annealed theories of learning boolean functions using a concept class of finite cardinality. The naive annealed theory can be used to derive a universal learning curve bound for zero temperature learning, similar to the inverse square root bound from the Vapnik-Chervonenkis theory. Tighter, nonuniversal learning curve bounds are also derived. A more refined annealed theory leads to still tighter bounds, which in some cases are very similar to results previously obtained using one-step replica symmetry breaking. 1. Introduction The annealed approximation 1 has proven to be an invaluable tool for studying the statistical mechanics of learning from examples. Previously it was found that the annealed approximation gave qualitatively correct results for several models of perceptrons learning realizable rules. 2 Because of its simplicity relative to the full quenched theory, the annealed approximation has since been used in studies of more complicated multilayer architectures. ...
On the VC-dimension of neural networks with binary weights
, 1996
"... Abstract: We investigate the VC-dimension of the perceptron and simple two-layer networks like the committee- and the parity-machine with weights restricted to values ±1. For binary inputs, the VC-dimension is determined by atypical pattern sets, i.e. it cannot be found by replica analysis or numeri ..."
Abstract
- Add to MetaCart
Abstract: We investigate the VC-dimension of the perceptron and simple two-layer networks like the committee- and the parity-machine with weights restricted to values ±1. For binary inputs, the VC-dimension is determined by atypical pattern sets, i.e. it cannot be found by replica analysis or numerical Monte Carlo sampling. For small systems, exhaustive enumerations yield exact results. For systems that are too large for enumerations, number theoretic arguments give lower bounds for the VC-dimension. For the Ising perceptron, the VC-dimension is probably larger than N/2.

