## Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1998)

### Cached

### Download Links

- [discus.anu.edu.au]
- [wwwsyseng.anu.edu.au]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Computation |

Citations: | 12 - 1 self |

### BibTeX

@ARTICLE{Bartlett98almostlinear,

author = {Peter Bartlett and Vitaly Maiorov and Ron Meir},

title = {Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks},

journal = {Neural Computation},

year = {1998},

volume = {10},

pages = {217--3}

}

### Years of Citing Articles

### OpenURL

### Abstract

We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W , where W is the number of parameters in the network. This result stands in opposition to the case where the number of layers is unbounded, in which case the VC dimension grows as W 2 . 1 MOTIVATION The VC dimension is an important measure of the complexity of a class of binaryvalued functions, since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feedforward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function. Goldberg and Jerrum [GJ95] have shown that...

### Citations

1564 |
Sobolev spaces
- Adams
- 1975
(Show Context)
Citation Context ...? d=2. In particular, we consider the subset W K r of functions f 2 W r satisfying kf (r) kL 2sK. Since r ? d=2, well-known embedding theorems from the theory of Sobolev spaces (see Corollary 5.16 in =-=[Ada75]-=-) imply that such a function f is in fact uniformly bounded. Standard techniques show that there is a bound M(K) so that, for all f 2 W K r and all x 2 R d , jf(x)jsM(K) (see, for example, Section 11 ... |

1022 | A Probabilistic Theory of Pattern Recognition - Devroye, Gyorfi - 1996 |

824 | Estimation of Dependencies Based on Empirical Data - Vapnik - 1982 |

635 | Learnability and the Vapnik-Chervonenkis dimension - Blumer, Ehrenfeucht, et al. - 1989 |

379 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...aM if ff ! \GammaM , M if ff ? M , ff otherwise. The need to restrict the range of the functions f arises here for technical reasons having to do with the derivation of the estimation error (see e.g. =-=[Hau92]-=-). Previous work along these lines has relied on bounded parameter values. In particular, Barron [Bar94] bounds the parameters c; a and b and imposes a Lipschitz constraint on the activation function ... |

318 |
Neural Network Learning: Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...ass of functions sgn(F) = fx 7! sgn(f(x; a)) : a 2 R W g. Before giving the main theorem of this section, we present the following result, which is a slight improvement of a result due to Warren (see =-=[ABar]-=-, Chapter 8). Lemma 2.1 Suppose f 1 (\Delta); f 2 (\Delta); : : : ; fm (\Delta) are fixed polynomials of degree at most l in nsm variables. Then the number of distinct sign vectors fsgn(f 1 (a)); : : ... |

179 | The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
- Bartlett
- 1998
(Show Context)
Citation Context ...educe these two terms are usually conflicting. For example, in order to derive good estimation error bounds one often assumes that the parameters are restricted to a bounded domain (see, for example, =-=[Bar98]-=-), which may be allowed to increase with the sample size, as in the method of sieves [GH82]. However, imposing such boundedness constraints leads to great difficulties in establishing approximation er... |

120 | Multilayer feedforward networks with a polynomial activation function can approximate any function
- Leshno, Lin, et al.
- 1993
(Show Context)
Citation Context ...ing the constraints on the parameters. In this paper we consider a specific class of multi-layered feedforward neural networks composed of piecewise polynomial activation functions. Since it is known =-=[LLPS93]-=- that a necessary and sufficient condition for universal approximation of continuous functions over compacta by neural networks is that the transfer functions be non-polynomial, it is interesting to c... |

115 |
Approximation and estimation bounds for artificial neural networks
- Barron
- 1991
(Show Context)
Citation Context ...s here for technical reasons having to do with the derivation of the estimation error (see e.g. [Hau92]). Previous work along these lines has relied on bounded parameter values. In particular, Barron =-=[Bar94]-=- bounds the parameters c; a and b and imposes a Lipschitz constraint on the activation function oe, while Haussler [Hau92] constrains the parameters c so that P i jc i jsfi ! 1, and assumes that the a... |

108 |
A Theory of Learning and Generalization
- Vidyasagar
- 1997
(Show Context)
Citation Context ...nary inputs, grows as W log W . We note that for the piecewise polynomial networks considered in this work, it is easy to show that the VC dimension and pseudo-dimension are closely related (see e.g. =-=[Vid96]-=-), so that similar bounds (with different constants) hold for the pseudo-dimension. Independently, Sakurai has obtained similar upper bounds and improved lower bounds on the VC dimension of piecewise ... |

94 | Sphere packing numbers for subsets of the Boolean n-cube with bounded VapnikChervonenkis dimension
- Haussler
- 1995
(Show Context)
Citation Context ...to X;Y and f(X i ; Y i )g N i=1 . Standard results from the theory of uniform convergence of empirical measures [Pol84], together with bounds on certain covering numbers in terms of pseudo-dimension (=-=[Hau95]-=-, Corollary 3), show that EL( f n )ss 2 + c 1 inf f2F ~ L(f) + c 2 MPdim(F) log n n ; (3) where c 1 and c 2 are absolute constants, s 2 = E f(Y \Gamma E(Y jX)) 2 g is the noise variance, and ~ L(f) = ... |

74 |
Nonparametric maximum likelihood estimation by the method of sieves
- Geman, Hwang
- 1982
(Show Context)
Citation Context ...on error bounds one often assumes that the parameters are restricted to a bounded domain (see, for example, [Bar98]), which may be allowed to increase with the sample size, as in the method of sieves =-=[GH82]-=-. However, imposing such boundedness constraints leads to great difficulties in establishing approximation error bounds, which have not in general been surmounted to date. Recently it has been shown t... |

73 |
Lower bound for approximation by nonlinear manifolds
- Warren
- 1968
(Show Context)
Citation Context ...n(F) = fx 7! sgn(f(x; a)) : a 2 R W g. Before giving the main theorem of this section, we present the following result, which is a slight improvement (see [AB98], Chapter 8) of a result due to Warren =-=[War68]-=-. Lemma 2.1 Suppose f 1 (\Delta); f 2 (\Delta); : : : ; fm (\Delta) are fixed polynomials of degree at most l in nsm variables. Then the number of distinct sign vectors fsgn(f 1 (a)); : : : ; sgn(f m ... |

69 | Efcient agnostic learning of neural networks with bounded fan-in - Lee, Bartlett, et al. - 1996 |

61 | Fat-shattering and the learnability of real-valued functions
- Bartlett, Long, et al.
- 1996
(Show Context)
Citation Context ... in that case the pseudo-dimension is essentially equivalent to another combinatorial dimension, called the fat-shattering dimension, which gives nearly matching lower bounds on estimation error (see =-=[BLW96]-=-). Combining the two results we obtain the total error bound EL( f n )ss 2 + c log k k r=d + c 0 Pdim(F) log n n ; (4) where c and c 0 are constants that depend only on K. By allowing the output bound... |

56 |
Functional Analysis in Normed Spaces
- Kantorovich, Akilov
- 1964
(Show Context)
Citation Context ...mply that such a function f is in fact uniformly bounded. Standard techniques show that there is a bound M(K) so that, for all f 2 W K r and all x 2 R d , jf(x)jsM(K) (see, for example, Section 11 in =-=[KA64]-=-). Now for a given K, if the regression function is in W K r and the network class F is as described above, with k hidden units and an output bound satisfying MsM(K), it has been shown in [MM97] that ... |

49 | Polynomial bounds for VC dimension of sigmoidal and general Pfaffian neural networks
- Karpinski, Macintyre
- 1997
(Show Context)
Citation Context ...s been shown that the VC dimension of many types of neural networks with continuous activation functions is finite even without imposing any conditions on the magnitudes of the parameters [MS93][GJ95]=-=[KM97]-=-. Since there is a close connection between the VC dimension and the estimation error (see Section 4), this result is significant in the context of learning. Thus, as long as the function itself is bo... |

46 | Neural networks with quadratic VC dimension
- Koiran, Sontag
- 1997
(Show Context)
Citation Context ...-input) units, each with a piecewise polynomial activation function. Goldberg and Jerrum [GJ95] have shown that VCdim(F)sc 1 (W 2 + Wk) = O(W 2 ), where c 1 is a constant. Moreover, Koiran and Sontag =-=[KS97]-=- have demonstrated such a network that has VCdim(F)sc 2 W 2 = \Omega\Gamma W 2 ), which would lead one to conclude that the bounds are in fact tight up to a constant. However, the proof used in [KS97]... |

45 | Finiteness results for sigmoidal neural networks
- Macintyre, Sontag
- 1993
(Show Context)
Citation Context ...cently it has been shown that the VC dimension of many types of neural networks with continuous activation functions is finite even without imposing any conditions on the magnitudes of the parameters =-=[MS93]-=-[GJ95][KM97]. Since there is a close connection between the VC dimension and the estimation error (see Section 4), this result is significant in the context of learning. Thus, as long as the function ... |

43 | Neural Networks for Optimal Approximation of Smooth and Analytic Functions
- Mhaskar
- 1996
(Show Context)
Citation Context ...same approximation rate holds for the class of bounded functions ��M ( P i c i oe(a T i x + b i )), MsM(K), which is considered here. This result is only slightly worse than the best result availa=-=ble [Mha96]-=- for the standard sigmoidal activation function oe(u) = 1=(1+e \Gammau ), where the approximation error is upper bounded by c=k r=d . Although the constraint r ? d=2 is rather restrictive, it is requi... |

39 | Nonparametric estimation via empirical risk minimization - Lugosi, Zeger - 1995 |

29 | Neural nets with superlinear VC-dimension - Maass - 1994 |

15 |
Tighter bounds on the VC-dimension of three-layer networks
- Sakurai
- 1993
(Show Context)
Citation Context ...sion which is \Omega\Gamma WL) for L = O(W ). Maass [Maa94] shows that three-layer networks with threshold activation functions and binary inputs have VC dimension \Omega\Gamma W log W ), and Sakurai =-=[Sak93]-=- shows that this is also true for two-layer networks with threshold activation functions and real inputs. It is easy to show that these results imply similar lower bounds if the threshold activation f... |

8 |
Convergence of Empirical Processes
- Pollard
- 1984
(Show Context)
Citation Context ...s are unrestricted in size. In order to complete the derivation of total error bounds, use must be made of results for the covering number of the above class of functions. From the results of Pollard =-=[Pol84]-=- and Haussler [Hau92, Hau95], these numbers may be upper bounded if upper bounds are available for the pseudo-dimension. We establish below upper and lower bounds for both the VC dimension and the pse... |

7 | On the near optimality of the stochastic approximation of smooth functions by neural networks
- Maiorov, Meir
- 1997
(Show Context)
Citation Context ...jX] \Gamma f(X)) 2sis the approximation error of f , and fn is a function from the class F that approximately minimizes the sample average of the quadratic loss. Making use of recently derived bounds =-=[MM97]-=- on the approximation error, inf f2F ~ L(f), which are equal, up to logarithmic factors, to those obtained for networks of units with the standard sigmoidal function oe(u) = (1 + e \Gammau ) \Gamma1 ,... |

6 |
A Theory of Learning in Artificial Neural Networks
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ..., and thus consider the class of functions sgn(F) = fx 7! sgn(f(x; a)) : a 2 R W g. Before giving the main theorem of this section, we present the following result, which is a slight improvement (see =-=[AB98]-=-, Chapter 8) of a result due to Warren [War68]. Lemma 2.1 Suppose f 1 (\Delta); f 2 (\Delta); : : : ; fm (\Delta) are fixed polynomials of degree at most l in nsm variables. Then the number of distinc... |

5 |
Bounding the VC dimension of concept classes parameterized by real numbers
- Goldberg, Jerrum
- 1995
(Show Context)
Citation Context ...ass of binary-valued functions computed by a feedforward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function. Goldberg and Jerrum =-=[GJ95]-=- have shown that VCdim(F)sc 1 (W 2 + Wk) = O(W 2 ), where c 1 is a constant. Moreover, Koiran and Sontag [KS97] have demonstrated such a network that has VCdim(F)sc 2 W 2 = \Omega\Gamma W 2 ), which w... |

2 |
Tight bounds for the VC-dimension of piecewise polynomial networks
- Sakurai
- 1999
(Show Context)
Citation Context ... (with different constants) hold for the pseudo-dimension. Independently, Sakurai has obtained similar upper bounds and improved lower bounds on the VC dimension of piecewise polynomial networks (see =-=[Sak99]-=-). 2 UPPER BOUNDS We begin the technical discussion with precise definitions of the VC-dimension and the class of networks considered in this work. Definition 1 Let X be a set, and A a system of subse... |