Results 1  10
of
67
Multilayer Feedforward Networks With a Nonpolynomial Activation Function Can Approximate Any Function
, 1993
"... Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multila ..."
Abstract

Cited by 117 (2 self)
 Add to MetaCart
Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multilayer feedforward network with a locally bounded piecewise continuous activation fimction can approximate an3, continuous function to any degree of accuracy if and only if the network's activation function is not a polynomial. We also emphasize the important role of the threshold, asserting that without it the last theorem does not hold.
On The Problem Of Local Minima In Backpropagation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract

Cited by 72 (17 self)
 Add to MetaCart
Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearlyseparable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms MultiLayered Networks, learning environment, Backpropagation, pattern recognition, linearlyseparable classes. I. Introduction Supervised learning in MultiLayered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Regression Modeling in BackPropagation and Projection Pursuit Learning
, 1994
"... We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interconnection weights. However, unlike the use of fixed nonlinear activations (usually sigmoidal) for the hidden neurons in BPL, the PPL systematically approximates the unknown nonlinear activations. Moreover, the BPL estimates all the weights simultaneously at each iteration, while the PPL estimates the weights cyclically (neuronbyneuron and layerbylayer) at each iteration. Although the BPL and the PPL have comparable training speed when based on a GaussNewton optimization algorithm, the PPL proves more parsimonious in that the PPL requires a fewer hi...
Uniqueness of the Weights for Minimal Feedforward Nets with a Given InputOutput Map
, 1992
"... We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associ ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associated to a particular hidden node), provided that the net is irreducible, i.e. that there does not exist an inner node that makes a zero contribution to the output, and there is no pair of hidden nodes that could be collapsed to a single node without altering the inputoutput map. Rutgers Center for Systems and Control, May 1991 Revised October 1991 Research supported in part by the Air Force Office of Scientific Research (AFOSR910343). The author thanks Eduardo Sontag for suggesting the problem and for his helpful comments and ideas, and an anonymous referee for suggesting how to improve the exposition at several points. Requests for reprints should be sent to H'ector J. Sussmann, Departme...
On Clustering Properties of Hierarchical SelfOrganizing Maps
 ARTIFICIAL NEURAL NETWORKS
, 1992
"... A very important theoretical result giving impetus to increasing interest in neural networks is that a multilayer feedforward network can approximate any function to arbitrary precision, or as a classifier it can form arbitraryly complex class boundaries [2]. In difficult practical classification p ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
A very important theoretical result giving impetus to increasing interest in neural networks is that a multilayer feedforward network can approximate any function to arbitrary precision, or as a classifier it can form arbitraryly complex class boundaries [2]. In difficult practical classification problems, like in pattern recognition and machine vision, the class boundaries will inevitably be very complex due to variations and distortions in the input images. To reduce the amount of trainig data needed the number of independent weights in the classifier must be reduced [1]. The tradeoff is between the capability of the classifier and the amount of training data. In machine vision problems it is often possible to acquire large amounts of training data as long as manual classification of the objects is not required. Thus unsupervised methods can be used in the preprocessing stage without large extra cost. The essential requirement for the preprocessor is that the (unknown) class boundaries shoud be simpler that in the original data, while any two separable classes should keep separable. Since the class boundaries are not known, the best preprocessing can do is to follow the distributions of the data samples, or in other words, clustering.
A survey of shape similarity assessment algorithms for product design and manufacturing applications
 Journal of Computing and Information Science in Engineering
, 2003
"... This document contains the draft version of the following paper: A. Cardone, S.K. Gupta, and M. Karnik. A survey of shape similarity assessment algorithms for product design and manufacturing applications. ASME Journal of ..."
Abstract

Cited by 45 (13 self)
 Add to MetaCart
This document contains the draft version of the following paper: A. Cardone, S.K. Gupta, and M. Karnik. A survey of shape similarity assessment algorithms for product design and manufacturing applications. ASME Journal of
Feedforward Neural Nets as Models for Time Series Forecasting
 ORSA Journal of Computing
, 1993
"... We have studied neural networks as models for time series forecasting, and our research compares the BoxJenkins method against the neural network method for long and short term memory series. Our work was inspired by previously published works that yielded inconsistent results about comparative per ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
We have studied neural networks as models for time series forecasting, and our research compares the BoxJenkins method against the neural network method for long and short term memory series. Our work was inspired by previously published works that yielded inconsistent results about comparative performance. We have since experimented with 16 time series of differing complexity using neural networks. The performance of the neural networks is compared with that of the BoxJenkins method. Our experiments indicate that for time series with long memory, both methods produced comparable results. However, for series with short memory, neural networks outperformed the BoxJenkins model. Because neural networks can be easily built for multiplestepahead forecasting, they present a better long term forecast model than the BoxJenkins method. We discussed the representation ability, the model building process and the applicability of the neural net approach. Neural networks appear to provide a ...
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
For Neural Networks, Function Determines Form
, 1992
"... This paper shows that the weights of continuoustime feedback neural networks are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function oe; if the two n ..."
Abstract

Cited by 31 (14 self)
 Add to MetaCart
This paper shows that the weights of continuoustime feedback neural networks are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function oe; if the two nets have equal behaviors as "black boxes" then necessarily they must have the same number of neurons and except at most for sign reversals at each node the same weights. Moreover, even if the activations are not a priori known to coincide, they are shown to be also essentially determined from the external measurements. Key words: Neural networks, identification from input/output data, control systems 1 Introduction Many recent papers have explored the computational and dynamical properties of systems of interconnected "neurons." For instance, Hopfield ([7]), Cowan ([4]), and Grossberg and his school (see e.g. [3]), have all studied devices that can be modelled by sets of nonlinear dif...
Uniqueness Of Weights For Neural Networks
 in Artificial Neural Networks with Applications in Speech and Vision
, 1993
"... Introduction In most applications dealing with learning and pattern recognition, neural nets are employed as models whose parameters, or "weights," must be fit to training data. Gradient descent and other algorithms are used in order to minimize an error functional, which penalizes mismatches betwe ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
Introduction In most applications dealing with learning and pattern recognition, neural nets are employed as models whose parameters, or "weights," must be fit to training data. Gradient descent and other algorithms are used in order to minimize an error functional, which penalizes mismatches between the desired outputs and those that a candidate net with a fixed architecture and varying weights produces. There are many numerical issues that arise naturally when using such a design approach, in particular: (i) the possibility of local minima which are not globally optimal, and (ii) the possibility of multiple global minimizers. The first question was dealt with by many different authors see for instance [5, 13, 14] and will not reviewed here. Regarding point (ii), observe that there are obvious transformations that leave the behavior of a network invariant, such as interchanges of all incoming and outgoing weights between two neurons, that is the relabeling of neu