## Neural Networks with Quadratic VC Dimension (1996)

### Cached

### Download Links

Citations: | 44 - 6 self |

### BibTeX

@MISC{Koiran96neuralnetworks,

author = {Pascal Koiran and Eduardo D. Sontag},

title = {Neural Networks with Quadratic VC Dimension},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.

### Citations

1745 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...is question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (=-=[15]). In this-=- framework, one starts by fitting some function f , chosen from a predetermined class F , to the given training data. The class F is often called the "hypothesis class", and for purposes of ... |

830 |
Estimation of dependences based on empirical data
- Vapnik
- 1982
(Show Context)
Citation Context ..., since there are excellent references available, both in textbook (e.g. [1, 11]) and survey paper (e.g. [10]) form, and the concept is by now very well-known. After the work of Vapnik in statistics (=-=[16]-=-) and of Blumer et. al. in computational learning theory ([4]), one knows that a certain combinatorial quantity, called the Vapnik-Chervonenkis (VC) dimension VC(F) of the class F of interest complete... |

634 |
Learnability and the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...book (e.g. [1, 11]) and survey paper (e.g. [10]) form, and the concept is by now very well-known. After the work of Vapnik in statistics ([16]) and of Blumer et. al. in computational learning theory (=-=[4]-=-), one knows that a certain combinatorial quantity, called the Vapnik-Chervonenkis (VC) dimension VC(F) of the class F of interest completely characterizes the sample sizes needed for learnability in ... |

390 |
On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines
- Blum, Shub, et al.
- 1989
(Show Context)
Citation Context ... above, VC dimension bounded by w log w. Our construction was originally motivated by a related one, given in [6], which showed that real-number programs (in the Blum-Shub-Smale model of computation) =-=[3]-=- with running time T have VC dimension\Omega\Gamma T 2 ). The desired result on continuous activations is then obtained, approximating Heaviside gates by oe-nets with large weights and approximating l... |

320 |
What size net gives valid generalization
- Baum, Haussler
- 1989
(Show Context)
Citation Context .... Estimating VC(F) then becomes a central concern. Thus from now on, we speak exclusively of VC dimension, instead of the original PAC learning problem. The work of Cover ([5]) and Baum and Haussler (=-=[2]-=-) dealt with the computation of VC(F) when the class F consists of networks built up from hard-threshold activations and having w weights; they showed that VC(F)= O(w log w). (Conversely, Maass showed... |

210 | Scale-sensitive dimensions, uniform convergence and learnability
- Alon, Ben-David, et al.
- 1997
(Show Context)
Citation Context ...ign gate. Hence we actually construct architectures with quadratic pseudo-dimension (and in fact with quadratic “V-dimension” since the same threshold is used for all points in the shattered set; see =-=[1]-=- for definitions of these and other generalizations of the VC dimension). The above formalism is too cumbersome for all proofs, so we often use obvious shortcuts. For instance, if we say “A is the net... |

123 |
Machine Learning : A Theoretical Approach
- Natarajan
- 1991
(Show Context)
Citation Context ...ts be well-sampled by the training data, so that f is an accurate fit. We omit the details of the formalization of PAC learning, since there are excellent references available, both in textbook (e.g. =-=[1, 11]-=-) and survey paper (e.g. [10]) form, and the concept is by now very well-known. After the work of Vapnik in statistics ([16]) and of Blumer et. al. in computational learning theory ([4]), one knows th... |

83 | Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers,” submitted
- Goldberg, Jerrum
(Show Context)
Citation Context .... Thus the problem of studying VC(F) for analog networks is an interesting and relevant issue. Two important contributions in this direction were the papers by Maass ([9]) and by Goldberg and Jerrum (=-=[6]-=-), which showed upper bounds on the VC dimension of networks that use piecewise polynomial activations. The last reference, in particular, established for that case an upper bound of O(w 2 ), where, a... |

65 | Feedforward nets for interpolation and classification
- Sontag
(Show Context)
Citation Context ...g, essentially because more memory capacity means that a given function f may be able to "memorize" in a "rote" fashion too much data, and less generalization is therefore possible=-=. Indeed, the paper [14]-=- showed that there are conceivable (though not very practical) neural architectures with extremely high VC dimensions. Thus the problem of studying VC(F) for analog networks is an interesting and rele... |

55 | Bounds for the computational power and learning complexity of analog neural nets
- Maass
- 1993
(Show Context)
Citation Context ...alt with the computation of VC(F) when the class F consists of networks built up from hard-threshold activations and having w weights; they showed that VC(F)= O(w log w). (Conversely, Maass showed in =-=[9]-=- that there is also a lower bound of this form.) It would appear that this definitely settled the VC dimension (and hence also the sample size) question. 2 This research was supported in part by US Ai... |

33 | Polynomial bounds for VC dimension of sigmoidal neural networks
- Karpinski, Macintyre
- 1995
(Show Context)
Citation Context ...owever, in contrast with the piecewise-polynomial case, there is still in that case a large gap between our\Omega\Gamma w 2 ) lower bound and the O(w 4 ) upper bound which was recently established in =-=[7]-=-.) A number of variations, dealing with Boolean inputs, or weakening the assumptions on oe, are also discussed. The last section includes some brief remarks regarding an interpretation of our results ... |

32 |
Computational Learning Theory: An Introduction. Cambridge Tracts
- Anthony, Biggs
- 1992
(Show Context)
Citation Context ...ts be well-sampled by the training data, so that f is an accurate fit. We omit the details of the formalization of PAC learning, since there are excellent references available, both in textbook (e.g. =-=[1, 11]-=-) and survey paper (e.g. [10]) form, and the concept is by now very well-known. After the work of Vapnik in statistics ([16]) and of Blumer et. al. in computational learning theory ([4]), one knows th... |

25 |
Capacity problems for linear machines
- Cover
- 1968
(Show Context)
Citation Context ...bly is proportional to VC(F). Estimating VC(F) then becomes a central concern. Thus from now on, we speak exclusively of VC dimension, instead of the original PAC learning problem. The work of Cover (=-=[5]-=-) and Baum and Haussler ([2]) dealt with the computation of VC(F) when the class F consists of networks built up from hard-threshold activations and having w weights; they showed that VC(F)= O(w log w... |

17 | Perspectives of current research about the complexity of learning in neural nets
- Maass
- 1994
(Show Context)
Citation Context ...ng data, so that f is an accurate fit. We omit the details of the formalization of PAC learning, since there are excellent references available, both in textbook (e.g. [1, 11]) and survey paper (e.g. =-=[10]-=-) form, and the concept is by now very well-known. After the work of Vapnik in statistics ([16]) and of Blumer et. al. in computational learning theory ([4]), one knows that a certain combinatorial qu... |

2 |
The development of TDNN architecture for speech recognition
- Lang, Hinton
- 1988
(Show Context)
Citation Context ...remains O(n log n) by the counting argument of [2]. A more restrictive type of weight-sharing has been studied in the neural network literature, and proved to be useful in invariant recognition tasks =-=[8]-=-. A formal model is studied in [12], and it is shown that the VC dimension remains O(n log n). In this model one assumes that there is an equivalence relation between weights; this is similar to our w... |

2 |
Threshold network learning in the presence of equivalences
- Shawe-Taylor
- 1992
(Show Context)
Citation Context ... argument of [2]. A more restrictive type of weight-sharing has been studied in the neural network literature, and proved to be useful in invariant recognition tasks [8]. A formal model is studied in =-=[12]-=-, and it is shown that the VC dimension remains O(n log n). In this model one assumes that there is an equivalence relation between weights; this is similar to our weight-sharing mechanism. However, o... |

2 |
Sigmoids distinguish better than Heavisides
- Sontag
- 1989
(Show Context)
Citation Context ... neurons. In contrast, the usually employed gradient descent learning algorithms ("backpropagation" method) rely upon continuous activations, that is, neurons with graded responses. As point=-=ed out in [13]-=-, the use of analog activations, which allow the passing of rich (not just binary) information among levels, may result in higher memory capacity as compared with threshold nets. This has serious pote... |