Results 1  10
of
31
Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a q ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a quadratic polynomial O((lm) 2 ) in both the number l of programmable parameters, and the number m of nodes. The proof method of this paper generalizes to much wider class of Pfaffian activation functions and formulas, and gives also for the first time polynomial bounds on their VC Dimension. We present also some other applications of our method.
Neural networks for control
 in Essays on Control: Perspectives in the Theory and its Applications (H.L. Trentelman and
, 1993
"... This paper starts by placing neural net techniques in a general nonlinear control framework. After that, several basic theoretical results on networks are surveyed. 1 ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
This paper starts by placing neural net techniques in a general nonlinear control framework. After that, several basic theoretical results on networks are surveyed. 1
On the Complexity of Training Neural Networks with Continuous Activation Functions
, 1993
"... We deal with computational issues of loading a fixedarchitecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading networks which do not consist of the binarythreshold neurons, but rather utilize a particular continuous activation func ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We deal with computational issues of loading a fixedarchitecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading networks which do not consist of the binarythreshold neurons, but rather utilize a particular continuous activation function, commonly used in the neural network literature. We observe that the loading problem is polynomialtime if the input dimension is constant. Otherwise, however, any possible learning algorithm based on particular fixed architectures faces severe computational barriers. Similar theorems have already been proved by Megiddo and by Blum and Rivest, to the case of binarythreshold networks only. Our theoretical results lend further justification to the use of incremental (architecturechanging) techniques for training networks rather than fixed architectures. Furthermore, they imply hardness of learnability in the probablyapproximatelycorrect sense as well.
VC Dimension of Neural Networks
 Neural Networks and Machine Learning
, 1998
"... . This paper presents a brief introduction to VapnikChervonenkis (VC) dimension, a quantity which characterizes the difficulty of distributionindependent learning. The paper establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in ne ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
. This paper presents a brief introduction to VapnikChervonenkis (VC) dimension, a quantity which characterizes the difficulty of distributionindependent learning. The paper establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. 1 Introduction In this expository paper, we present a brief introduction to the subject of computing and estimating the VC dimension of neural network architectures. We provide precise definitions and prove several basic results, discussing also how one estimates VC dimension in several examples of interest in neural network theory. We do not address the learning and estimationtheoretic applications of VC dimension. (Roughly, the VC dimension is a number which helps to quantify the difficulty when learning from examples. The sample complexity, that is, the number of "learning instances" that one must be exposed to, in order to be reasonably certain to derive accurate p...
Perspectives of Current Research about the Complexity of Learning on Neural Nets
, 1994
"... This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:  Neural nets that learn from mistakes  Bounds for the VapnikChervonenkis dimension of neural net ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:  Neural nets that learn from mistakes  Bounds for the VapnikChervonenkis dimension of neural nets  Agnostic PAClearning of functions on neural nets. All relevant definitions are given in this paper, and no previous knowledge about computational learning theory or neural nets is required. We refer to [RSO] for further introductory material and survey papers about the complexity of learning on neural nets. Throughout this paper we consider the following rather general notion of a (feedforward) neural net.
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1997
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generali ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are briefly discussed for the basic PAC model. Throughout, the importance of the VapnikChervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binaryvalued output and for networks with realvalued output.
Analog versus Discrete Neural Networks
 Neural Computation
, 1996
"... We show that neural networks with threetimes continuously differentiable activation functions are capable of computing a certain family of nbit Boolean functions with two gates, whereas networks composed of binary threshold functions require at least \Omega\Gammaast n) gates. Thus, for a large cla ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We show that neural networks with threetimes continuously differentiable activation functions are capable of computing a certain family of nbit Boolean functions with two gates, whereas networks composed of binary threshold functions require at least \Omega\Gammaast n) gates. Thus, for a large class of activation functions, analog neural networks can be more powerful than discrete neural networks, even when computing Boolean functions. 1 Introduction. Artificial neural networks have become a popular model for machine learning and many results have been obtained regarding their application to practical problems. Typically, the network is trained to encode complex associations between inputs and outputs during supervised training cycles, where the associations are encoded by the weights of the network. Once trained, the network will compute an input/output mapping which (hopefully) is a good approximation of the original mapping. 1 Partially supported by NSF Grant CCR9114545 In thi...
Foundations Of Recurrent Neural Networks
, 1993
"... "Artificial neural networks" provide an appealing model of computation. Such networks consist of an interconnection of a number of parallel agents, or "neurons." Each of these receives certain signals as inputs, computes some simple function, and produces a signal as output, which is in turn broadca ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
"Artificial neural networks" provide an appealing model of computation. Such networks consist of an interconnection of a number of parallel agents, or "neurons." Each of these receives certain signals as inputs, computes some simple function, and produces a signal as output, which is in turn broadcast to the successive neurons involved in a given computation. Some of the signals originate from outside the network, and act as inputs to the whole system, while some of the output signals are communicated back to the environment and are used to encode the end result of computation. In this dissertation we focus on the "recurrent network" model, in which the underlying graph is not subject to any constraints. We investigate the computational power of neural nets, taking a classical computer science point of view. We characterize the language re...
Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity
 IEEE Transactions on Automatic Control
"... In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypo ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypothesis," from a set of candidates that will accurately predict the labels of new sample points. The expected mismatch between a hypothesis' prediction and the actual label of a new sample point is called the hypothesis ' "generalization error." We compare the canonical learner with the traditional technique of finding hypotheses that minimize the relative frequencybased empirical error estimate. We show that, for a broad class of learning problems, the set of cases for which such empirical error minimization works is a proper subset of the cases for which the canonical learner works. We derive bounds to show that the number of samples required by these two methods is comparable. We also add...