Results 1 - 10
of
33
A Practical Bayesian Framework for Backprop Networks
- Neural Computation
, 1991
"... A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures ..."
Abstract
-
Cited by 347 (19 self)
- Add to MetaCart
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
- Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract
-
Cited by 59 (12 self)
- Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multi-layer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Evolutionary Algorithms for Neural Network Design and Training
- IN PROCEEDINGS OF THE FIRST NORDIC WORKSHOP ON GENETIC ALGORITHMS AND ITS APPLICATIONS
, 1995
"... Neural networks and genetic algorithms are two relatively young research areas that were subject to a steadily growing interest during the past years. Both models are inspired by nature, but whereas neural networks are concerned with learning of an individual (phenotypic learning), evolutionary algo ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
Neural networks and genetic algorithms are two relatively young research areas that were subject to a steadily growing interest during the past years. Both models are inspired by nature, but whereas neural networks are concerned with learning of an individual (phenotypic learning), evolutionary algorithms deal with a population's adaptation to a changing environment (genotypic learning). This paper focuses on the intersection of neural networks and evolutionary computation, namely on how evolutionary algorithms can be used to assist neural network design and training. The purpose of the paper is to set forth the general considerations that have to be made when designing an algorithm in this area and to give an overview on how researchers addressed these issues in the past.
Evolving Optimal Neural Networks Using Genetic Algorithms with Occam's Razor
- COMPLEX SYSTEMS
, 1993
"... Genetic algorithms have been used for neural networks in two main ways: to optimize the network architecture and to train the weights of a fixed architecture. While most previous work focuses on only one of these two options, this paper investigates an alternative evolutionary approach called Breed ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Genetic algorithms have been used for neural networks in two main ways: to optimize the network architecture and to train the weights of a fixed architecture. While most previous work focuses on only one of these two options, this paper investigates an alternative evolutionary approach called Breeder Genetic Programming (BGP) in which the architecture and the weights are optimized simultaneously. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifically defined genetic operators. The weights are trained by a next-ascent hillclimbing search. A new fitness function is proposed that quantifies the principle of Occam's razor. It makes an optimal trade-off between the error fitting ability and the parsimony of the network. Simulation results on two benchmark problems of differing complexity suggest that the method finds minimal size networks on clean data. The experiments on noisy data show...
Accelerated Learning By Active Example Selection
- International Journal of Neural Systems
, 1994
"... Much previous work on training multilayer neural networks has attempted to speed up the back-propagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative a ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
Much previous work on training multilayer neural networks has attempted to speed up the back-propagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms. 1 Introduction One of the most widely used methods for training multilayer feedforward neural networks is the erro...
Genetic Programming of Minimal Neural Nets Using Occam's Razor
- Proceedings of the 5th international conference on genetic algorithms (ICGA'93
, 1993
"... A genetic programming method is investigated for optimizing both the architecture and the connection weights of multilayer feedforward neural networks. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifica ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
A genetic programming method is investigated for optimizing both the architecture and the connection weights of multilayer feedforward neural networks. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by specifically defined genetic operators. The weights are trained by a next-ascent hillclimbing search. A new fitness function is proposed that quantifies the principle of Occam's razor. It makes an optimal trade-off between the error fitting ability and the parsimony of the network. We discuss the results for two problems of differing complexity and study the convergence and scaling properties of the algorithm. 1 INTRODUCTION Optimization of neural network architectures for particular applications is important because the speed and accuracy of learning and performance are dependent on the network complexity, i.e. the type and number of units and connections, and the connectivity of units. Fo...
Neural networks, nativism, and the plausibility of constructivism
- Cognition
, 1993
"... Recent interest in PDP (parallel distributed processing) models is due in part to the widely held belief that they challenge many of the assumptions of classical cognitive science. In the domain of language acquisition, for example, there has been much interest in the claim that PDP models might und ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Recent interest in PDP (parallel distributed processing) models is due in part to the widely held belief that they challenge many of the assumptions of classical cognitive science. In the domain of language acquisition, for example, there has been much interest in the claim that PDP models might undermine nativism. Related argu-ments based on PDP learning have also been given against Fodor’s anti-construc-tivist position- a position that has contributed to the widespread dismissal of constructivism. A limitation of many of the claims regarding PDP learning, however, is that the principles underlying this learning have not been rigorously characterized. In this paper, I examine PDP models from within the framework of Valiant’s PAC (probably approximately correct) model of learning, now the dominant model in machine learning, and which applies naturally to neural network learning. From this perspective, I evaluate the implications of PDP models for nativism and Fodor’s influential anti-constructivist position. In particular, I demonstrate that, contrary to a number of claims, PDP models are nativist in a robust sense. I also demonstrate that PDP models actually serve as a good illustration of Fodor’s anti-constructivist position. While these results may at first suggest that neural network models in general are incapable of the sort of concept acquisition that is required to refute Fodor’s anti-constructivist position, I suggest
What size neural network gives optimal generalization? convergence properties of backpropagation
, 1996
"... One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with r ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversized networks can result in lower training and generalization error in certain cases, and for c) the use of committee or ensemble techniques can be more beneficial as the level of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks.
Incremental Learning using Sensitivity Analysis
, 1999
"... A new incremental learning algorithm for function approximation problems is presented where the neural network learner dynamically selects during training the most informative patterns from a candidate training set. The incremental learning algorithm uses its current knowledge about the function to ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
A new incremental learning algorithm for function approximation problems is presented where the neural network learner dynamically selects during training the most informative patterns from a candidate training set. The incremental learning algorithm uses its current knowledge about the function to be approximated, in the form of output sensitivity information, to incrementally grow the training set with patterns that have the highest influence on the learning objective.
Linear and order statistics combiners for reliable pattern classification
, 1996
"... vi Table of Contents viii List of Figures xiii List of Tables xiv List of Symbols xvii List of Acronyms xx Chapter 1. Introduction 1 Chapter 2. Background and Related Research 8 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2 Generalization : : : : : : : : : : : : ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
vi Table of Contents viii List of Figures xiii List of Tables xiv List of Symbols xvii List of Acronyms xx Chapter 1. Introduction 1 Chapter 2. Background and Related Research 8 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2 Generalization : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.3 Statistical Background : : : : : : : : : : : : : : : : : : : : : : : : 13 2.4 Regularization : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.5 Motivation for Combining : : : : : : : : : : : : : : : : : : : : : : 18 2.6 Historical sketch : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 viii 2.6.1 Survey of Recent Literature : : : : : : : : : : : : : : : : : 19 2.6.2 Belief and Evidence Combining : : : : : : : : : : : : : : : 22 2.6.3 Economic Forecasting : : : : : : : : : : : : : : : : : : : : 23 2.6.4 Stacked Generalization : : : : : : : : : : : : : : : : : : : : 23 2.6.5 Ensemble Methods : : : : : : : : : : : : : : : : : : : : : ...

