Results 1  10
of
39
A Practical Bayesian Framework for Backprop Networks
 Neural Computation
, 1991
"... A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures ..."
Abstract

Cited by 398 (20 self)
 Add to MetaCart
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Evolutionary Algorithms for Neural Network Design and Training
 IN PROCEEDINGS OF THE FIRST NORDIC WORKSHOP ON GENETIC ALGORITHMS AND ITS APPLICATIONS
, 1995
"... Neural networks and genetic algorithms are two relatively young research areas that were subject to a steadily growing interest during the past years. Both models are inspired by nature, but whereas neural networks are concerned with learning of an individual (phenotypic learning), evolutionary algo ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
Neural networks and genetic algorithms are two relatively young research areas that were subject to a steadily growing interest during the past years. Both models are inspired by nature, but whereas neural networks are concerned with learning of an individual (phenotypic learning), evolutionary algorithms deal with a population's adaptation to a changing environment (genotypic learning). This paper focuses on the intersection of neural networks and evolutionary computation, namely on how evolutionary algorithms can be used to assist neural network design and training. The purpose of the paper is to set forth the general considerations that have to be made when designing an algorithm in this area and to give an overview on how researchers addressed these issues in the past.
Evolving Optimal Neural Networks Using Genetic Algorithms with Occam's Razor
 COMPLEX SYSTEMS
, 1993
"... Genetic algorithms have been used for neural networks in two main ways: to optimize the network architecture and to train the weights of a fixed architecture. While most previous work focuses on only one of these two options, this paper investigates an alternative evolutionary approach called Breed ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
Genetic algorithms have been used for neural networks in two main ways: to optimize the network architecture and to train the weights of a fixed architecture. While most previous work focuses on only one of these two options, this paper investigates an alternative evolutionary approach called Breeder Genetic Programming (BGP) in which the architecture and the weights are optimized simultaneously. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifically defined genetic operators. The weights are trained by a nextascent hillclimbing search. A new fitness function is proposed that quantifies the principle of Occam's razor. It makes an optimal tradeoff between the error fitting ability and the parsimony of the network. Simulation results on two benchmark problems of differing complexity suggest that the method finds minimal size networks on clean data. The experiments on noisy data show...
Accelerated Learning By Active Example Selection
 International Journal of Neural Systems
, 1994
"... Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative a ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms. 1 Introduction One of the most widely used methods for training multilayer feedforward neural networks is the erro...
Genetic Programming of Minimal Neural Nets Using Occam's Razor
 Proceedings of the 5th international conference on genetic algorithms (ICGA'93
, 1993
"... A genetic programming method is investigated for optimizing both the architecture and the connection weights of multilayer feedforward neural networks. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifica ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
A genetic programming method is investigated for optimizing both the architecture and the connection weights of multilayer feedforward neural networks. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifically defined genetic operators. The weights are trained by a nextascent hillclimbing search. A new fitness function is proposed that quantifies the principle of Occam's razor. It makes an optimal tradeoff between the error fitting ability and the parsimony of the network. We discuss the results for two problems of differing complexity and study the convergence and scaling properties of the algorithm. 1 INTRODUCTION Optimization of neural network architectures for particular applications is important because the speed and accuracy of learning and performance are dependent on the network complexity, i.e. the type and number of units and connections, and the connectivity of units. Fo...
Neural networks, nativism, and the plausibility of constructivism
 Cognition
, 1993
"... Recent interest in PDP (parallel distributed processing) models is due in part to the widely held belief that they challenge many of the assumptions of classical cognitive science. In the domain of language acquisition, for example, there has been much interest in the claim that PDP models might und ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Recent interest in PDP (parallel distributed processing) models is due in part to the widely held belief that they challenge many of the assumptions of classical cognitive science. In the domain of language acquisition, for example, there has been much interest in the claim that PDP models might undermine nativism. Related arguments based on PDP learning have also been given against Fodor’s anticonstructivist position a position that has contributed to the widespread dismissal of constructivism. A limitation of many of the claims regarding PDP learning, however, is that the principles underlying this learning have not been rigorously characterized. In this paper, I examine PDP models from within the framework of Valiant’s PAC (probably approximately correct) model of learning, now the dominant model in machine learning, and which applies naturally to neural network learning. From this perspective, I evaluate the implications of PDP models for nativism and Fodor’s influential anticonstructivist position. In particular, I demonstrate that, contrary to a number of claims, PDP models are nativist in a robust sense. I also demonstrate that PDP models actually serve as a good illustration of Fodor’s anticonstructivist position. While these results may at first suggest that neural network models in general are incapable of the sort of concept acquisition that is required to refute Fodor’s anticonstructivist position, I suggest
What size neural network gives optimal generalization? convergence properties of backpropagation
, 1996
"... One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a prespecified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with r ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a prespecified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversized networks can result in lower training and generalization error in certain cases, and for c) the use of committee or ensemble techniques can be more beneficial as the level of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks.
Neural Network Models of Sensory Integration for Improved Vowel Recognition
, 1990
"... Automatic speech recognizers currently perform poorly in the presence of noise. Humans, on the other hand, often compensate for noise degradation by extracting speech information from alternative sources and then integrating this information with the acoustical signal. Visual signals from the speake ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Automatic speech recognizers currently perform poorly in the presence of noise. Humans, on the other hand, often compensate for noise degradation by extracting speech information from alternative sources and then integrating this information with the acoustical signal. Visual signals from the speaker’s face are one source of supplemental speech information. We demonstrate that multiple sources of speech information can be integrated at a subsymbolic level to improve vowel recognition. Feedforward and recurrent neural networks are trained to estimate the acoustic characteristics of the vocal tract from images of the speaker‘s mouth. These estimates are then combined with the noisedegraded acoustic information, effectively increasing the signaltonoise ratio and improving the recognition of these noisedegraded signals. Alternative symbolic strategies, such as direct categorization of the visual signals into vowels, are also presented. The performances of these neural networks compared favorably with human performance and with other patternmatching and estimation techniques.