Results 1  10
of
36
Optimal Brain Damage
 Advances in Neural Information Processing Systems
, 1990
"... We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improve ..."
Abstract

Cited by 420 (5 self)
 Add to MetaCart
We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use secondderivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a realworld application. 1 INTRODUCTION Most successful applications of neural network learning to realworld problems have been achieved using highly structured networks of rather large size [for example (Waibel, 1989; LeCun et al., 1990)]. As applications become more complex, the networks will presumably become even larger and more structured. Design tools and techniques for comparing different architectures and minimizing the network size will be needed. More impor...
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
 IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascadecorrelation, resourceallocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Speed Up Learning and Network Optimization With Extended Back Propagation
, 1992
"... Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obt ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obtained by gradient descent. The resulting learning dynamics can be simulated by a standard network with fixed sigmoids and a learning rule whose main component is a gradient descent with adaptive learning parameters. A law linking variation on the weights to variation on the steepness of the sigmoids is discovered. Optimization of units is obtained by introducing a tendency to decay to zero in the steepness values. This decay corresponds to a decay of the sensitivity of the units. Units with low final sensitivity can be removed after a given transformation of the biases of the network. A decreasing initial distribution of the steepness values is suggested to obtain a good compromise between s...
Structural adaptation and generalization in supervised feedforward networks, d
 Artif. Neural Networks
, 1994
"... This work explores diverse techniques for improving the generalization ability of supervised feedforward neural networks via structural adaptation, and introduces a new network structure with sparse connectivity. Pruning methods which start from a large network and proceed in trimming it until a sa ..."
Abstract

Cited by 31 (22 self)
 Add to MetaCart
This work explores diverse techniques for improving the generalization ability of supervised feedforward neural networks via structural adaptation, and introduces a new network structure with sparse connectivity. Pruning methods which start from a large network and proceed in trimming it until a satisfactory solution is reached, are studied first. Then, construction methods, which build a network from a simple initial configuration, are presented. A survey of related results from the disciplines of function approximation theory, nonparametric statistical inference and estimation theory leads to methods for principled architecture selection and estimation of prediction error. A network based on sparse connectivity is proposed as an alternative approach to adaptive networks. The generalization ability of this network is improved by partly decoupling the outputs. We perform numerical simulations and provide comparative results for both classification and regression problems to show the generalization abilities of the sparse network. 1
An iterative pruning algorithm for feedforward neural networks
 IEEE Trans. Neural. Networks
, 1997
"... Abstract — The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach tackling this problem is commonly known as pruning and consists o ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract — The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach tackling this problem is commonly known as pruning and consists of training a larger than necessary network and then removing unnecessary weights/nodes. In this paper, a new pruning method is developed, based on the idea of iteratively eliminating units and adjusting the remaining weights in such a way that the network performance does not worsen over the entire training set. The pruning problem is formulated in terms of solving a system of linear equations, and a very efficient conjugate gradient algorithm is used for solving it, in the leastsquares sense. The algorithm also provides a simple criterion for choosing the units to be removed, which has proved to work well in practice. The results obtained over various test problems demonstrate the effectiveness of the proposed approach. Index Terms — Feedforward neural networks, generalization, hidden neurons, iterative methods, leastsquares methods, network pruning, pattern recognition, structure simplification. I.
Benefits of Gain: Speeded learning and minimal hidden layers in backpropagation networks.
, 1991
"... The gain of a node in a connectionist network is a multiplicative constant that amplifies or attenuates the net input to the node. The objective of this article is to explore the benefits of adaptive gains in back propagation networks. First we show that gradient descent with respect to gain greatly ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
The gain of a node in a connectionist network is a multiplicative constant that amplifies or attenuates the net input to the node. The objective of this article is to explore the benefits of adaptive gains in back propagation networks. First we show that gradient descent with respect to gain greatly increases learning speed by amplifying those directions in weight space that are successfully chosen by gradient descent on weights. Adpative gains also allow normalization of weight vectors without loss of computational capacity, and we suggest a simple modification of the learning rule that automatically achieves weight normalization. Finally, we describe a method for creating small hidden layers by making hidden node gains compete according to similarities between nodes, with the goal of improved generalization performance. Simulations show that this competition method is more effective than the special case of gain decay. * In press: IEEE Transactions on Systems, Man and Cybernetics. S...
Constructive Feedforward Neural Networks for Regression Problems: A Survey
, 1995
"... In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard backpropagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additiona ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard backpropagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additional hidden units and weights until a satisfactory solution is found. The constructive procedures are categorized according to the resultant network architecture and the learning algorithm for the network weights. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science 1 Introduction In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among them, the class of multilayer feedforward networks is perhaps the most popular. Standard backpropagation performs gradient descent only in the weight space of a network with fixed topology; this approach is analogous to ...
GAL: Networks that grow when they learn and shrink when they forget
 INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1991
"... Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if t ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if the network designer predefines an appropriate network structure, i.e., number of hidden layers, units, and the size and shape of their receptive and projective fields. This paper advocates the view that the network structure should not, as usually done, be determined by trialanderror but should be computed by the learning algorithm. Incremental learning algorithms can modify the network structure by addition and/or removal of units and/or links. A survey of current connectionist literature is given on this line of thought. "Grow and Learn" (GAL) is a new algorithm that learns an association at oneshot due to being incremental and using a local representation. During the socalled...
An Anytime Approach To Connectionist Theory Refinement: Refining The Topologies Of KnowledgeBased Neural Networks
, 1995
"... Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductivelearning algorithms that try to learn the true "concept" of a ta ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductivelearning algorithms that try to learn the true "concept" of a task from a set of its examples. Often times, however, one has additional resources readily available, but largely unused, that can improve the concept that these learning algorithms generate. These resources include available computer cycles, as well as prior knowledge describing what is currently known about the domain. Effective utilization of available computer time is important since for most domains an expert is willing to wait for weeks, or even months, if a learning system can produce an improved concept. Using prior knowledge is important since it can contain information not present in the current set of training examples. In this thesis, I present three "anytime" approaches to connec...
Automated Learning of LoadBalancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDERSIDE RULES (s) Possibledestinations = { site: Load(site)  Reference(s) < d(s) } Destination = ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDERSIDE RULES (s) Possibledestinations = { site: Load(site)  Reference(s) < d(s) } Destination = Random(Possibledestinations) IF Load(s)  Reference(s) > q 1 (s) THEN Send RECEIVERSIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The loadbalancing policy considered in this thesis The senderside rules are applied by the loadbalancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters  d, q 1 , and q 2  take nonnegative floatingpoint values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...