Results 1  10
of
55
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 177 (15 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
On The Computational Power Of Neural Nets
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by su ..."
Abstract

Cited by 156 (26 self)
 Add to MetaCart
This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by such nets. In particular, one can simulate any multistack Turing Machine in real time, and there is a net made up of 886 processors which computes a universal partialrecursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Nondeterministic Turing Machines can be simulated by nondeterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.
Bounding the VapnikChervonenkis dimension of concept classes parameterized by real numbers
 Machine Learning
, 1995
"... Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are au ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on VC dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the VC dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the firstorder theory of the reals) with fixed quantification depth and exponentiallybounded length, whose atomic predicates are polynomial inequalities of exponentiallybounded degree. The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time. Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexitytheoretic and not informationtheoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the VC dimension for these classes. Keywords: Concept learning, information theory, VapnikChervonenkis dimension, Milnor’s theorem 1.
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a q ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a quadratic polynomial O((lm) 2 ) in both the number l of programmable parameters, and the number m of nodes. The proof method of this paper generalizes to much wider class of Pfaffian activation functions and formulas, and gives also for the first time polynomial bounds on their VC Dimension. We present also some other applications of our method.
Neural Networks with Quadratic VC Dimension
, 1996
"... This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a longstanding open question, namely whether the wellknown O(w log w) bound, known for hardthreshold nets, also held fo ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a longstanding open question, namely whether the wellknown O(w log w) bound, known for hardthreshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.
Finiteness Results for Sigmoidal "Neural" Networks
 In Proceedings of 25th Annual ACM Symposium on the Theory of Computing
, 1993
"... ) Angus Macintyre Mathematical Inst., University of Oxford Oxford OX1 3LB, England, UK Email: ajm@maths.ox.ac.uk Eduardo D. Sontag 3 Dept. of Mathematics, Rutgers University New Brunswick, NJ 08903 Email: sontag@hilbert.rutgers.edu Abstract Proc. 25th Annual Symp. Theory Computing , San Diego, ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
) Angus Macintyre Mathematical Inst., University of Oxford Oxford OX1 3LB, England, UK Email: ajm@maths.ox.ac.uk Eduardo D. Sontag 3 Dept. of Mathematics, Rutgers University New Brunswick, NJ 08903 Email: sontag@hilbert.rutgers.edu Abstract Proc. 25th Annual Symp. Theory Computing , San Diego, May 1993 This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions...
Feedback Stabilization Using TwoHiddenLayer Nets
 IEEE Trans. Neural Networks
, 1992
"... This paper compares the representational capabilities of one hidden layer and two hidden layer nets consisting of feedforward interconnections of linear threshold units. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
This paper compares the representational capabilities of one hidden layer and two hidden layer nets consisting of feedforward interconnections of linear threshold units. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the known approximation theorems. The differences are not based on numerical accuracy or number of units needed, nor on capabilities for feature extraction, but rather on a much more basic classification into “direct ” and “inverse ” problems. The former correspond to the approximation of continuous functions, while the latter are concerned with approximating onesided inverses of continuous functions —and are often encountered in the context of inverse kinematics determination or in control questions. A general result is given showing that nonlinear control systems can be stabilized using two hidden layers, but not in general using just one. Key words: Neural nets, nonlinear control systems, feedback 1
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
Neural Nets with Superlinear VCDimension
 Neural Computation
, 1994
"... It has been known for quite a while that the VapnikChervonenkis dimension (VCdimension) of a feedforward neural net with linear threshold gates is at most O(w \Delta log w), where w is the total number of weights in the neural net. We show in this paper that this bound is in fact asymptotically op ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
It has been known for quite a while that the VapnikChervonenkis dimension (VCdimension) of a feedforward neural net with linear threshold gates is at most O(w \Delta log w), where w is the total number of weights in the neural net. We show in this paper that this bound is in fact asymptotically optimal. More precisely, we construct for arbitrarily large w 2 N neural nets Nw of depth 3 (i.e. with 2 layers of hidden units) that have VCdimension\Omega\Gamma w \Delta log w). The construction exhibits a method that allows us to encode more "programbits" in the weights of a neural net than previously thought possible. The VapnikChervonenkisdimension (abbreviated: VCdimension) of a neural net N is an important measure of the expressiveness of N , i.e. for the variety of functions that can be computed by N with different choices for its weights. In particular it has been shown in [BEHW] and [EHKV] that the VCdimension of N essentially determines the number of training examples th...