Results 1 
3 of
3
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
Annealed Theories of Learning
 In J.H
, 1995
"... We study annealed theories of learning boolean functions using a concept class of finite cardinality. The naive annealed theory can be used to derive a universal learning curve bound for zero temperature learning, similar to the inverse square root bound from the VapnikChervonenkis theory. Tighter, ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We study annealed theories of learning boolean functions using a concept class of finite cardinality. The naive annealed theory can be used to derive a universal learning curve bound for zero temperature learning, similar to the inverse square root bound from the VapnikChervonenkis theory. Tighter, nonuniversal learning curve bounds are also derived. A more refined annealed theory leads to still tighter bounds, which in some cases are very similar to results previously obtained using onestep replica symmetry breaking. 1. Introduction The annealed approximation 1 has proven to be an invaluable tool for studying the statistical mechanics of learning from examples. Previously it was found that the annealed approximation gave qualitatively correct results for several models of perceptrons learning realizable rules. 2 Because of its simplicity relative to the full quenched theory, the annealed approximation has since been used in studies of more complicated multilayer architectures. ...
On the VCdimension of neural networks with binary weights
, 1996
"... Abstract: We investigate the VCdimension of the perceptron and simple twolayer networks like the committee and the paritymachine with weights restricted to values ±1. For binary inputs, the VCdimension is determined by atypical pattern sets, i.e. it cannot be found by replica analysis or numeri ..."
Abstract
 Add to MetaCart
Abstract: We investigate the VCdimension of the perceptron and simple twolayer networks like the committee and the paritymachine with weights restricted to values ±1. For binary inputs, the VCdimension is determined by atypical pattern sets, i.e. it cannot be found by replica analysis or numerical Monte Carlo sampling. For small systems, exhaustive enumerations yield exact results. For systems that are too large for enumerations, number theoretic arguments give lower bounds for the VCdimension. For the Ising perceptron, the VCdimension is probably larger than N/2.