Results 1 - 10
of
31
Optimal Brain Damage
- Advances in Neural Information Processing Systems
, 1990
"... We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improve ..."
Abstract
-
Cited by 375 (5 self)
- Add to MetaCart
We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application. 1 INTRODUCTION Most successful applications of neural network learning to real-world problems have been achieved using highly structured networks of rather large size [for example (Waibel, 1989; LeCun et al., 1990)]. As applications become more complex, the networks will presumably become even larger and more structured. Design tools and techniques for comparing different architectures and minimizing the network size will be needed. More impor...
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
- IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords--- Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascade-correlation, resource-allocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Speed Up Learning and Network Optimization With Extended Back Propagation
, 1992
"... Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obt ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obtained by gradient descent. The resulting learning dynamics can be simulated by a standard network with fixed sigmoids and a learning rule whose main component is a gradient descent with adaptive learning parameters. A law linking variation on the weights to variation on the steepness of the sigmoids is discovered. Optimization of units is obtained by introducing a tendency to decay to zero in the steepness values. This decay corresponds to a decay of the sensitivity of the units. Units with low final sensitivity can be removed after a given transformation of the biases of the network. A decreasing initial distribution of the steepness values is suggested to obtain a good compromise between s...
An iterative pruning algorithm for feedforward neural networks
- IEEE Trans. Neural. Networks
, 1997
"... Abstract — The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach tackling this problem is commonly known as pruning and consists o ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract — The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach tackling this problem is commonly known as pruning and consists of training a larger than necessary network and then removing unnecessary weights/nodes. In this paper, a new pruning method is developed, based on the idea of iteratively eliminating units and adjusting the remaining weights in such a way that the network performance does not worsen over the entire training set. The pruning problem is formulated in terms of solving a system of linear equations, and a very efficient conjugate gradient algorithm is used for solving it, in the least-squares sense. The algorithm also provides a simple criterion for choosing the units to be removed, which has proved to work well in practice. The results obtained over various test problems demonstrate the effectiveness of the proposed approach. Index Terms — Feedforward neural networks, generalization, hidden neurons, iterative methods, least-squares methods, network pruning, pattern recognition, structure simplification. I.
Constructive Feedforward Neural Networks for Regression Problems: A Survey
, 1995
"... In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard back-propagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additiona ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard back-propagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additional hidden units and weights until a satisfactory solution is found. The constructive procedures are categorized according to the resultant network architecture and the learning algorithm for the network weights. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science 1 Introduction In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among them, the class of multi-layer feedforward networks is perhaps the most popular. Standard back-propagation performs gradient descent only in the weight space of a network with fixed topology; this approach is analogous to ...
GAL: Networks that grow when they learn and shrink when they forget
- INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1991
"... Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if t ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if the network designer predefines an appropriate network structure, i.e., number of hidden layers, units, and the size and shape of their receptive and projective fields. This paper advocates the view that the network structure should not, as usually done, be determined by trial-and-error but should be computed by the learning algorithm. Incremental learning algorithms can modify the network structure by addition and/or removal of units and/or links. A survey of current connectionist literature is given on this line of thought. "Grow and Learn" (GAL) is a new algorithm that learns an association at one-shot due to being incremental and using a local representation. During the so-called...
Benefits of Gain: Speeded learning and minimal hidden layers in back-propagation networks.
, 1991
"... The gain of a node in a connectionist network is a multiplicative constant that amplifies or attenuates the net input to the node. The objective of this article is to explore the benefits of adaptive gains in back propagation networks. First we show that gradient descent with respect to gain greatly ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The gain of a node in a connectionist network is a multiplicative constant that amplifies or attenuates the net input to the node. The objective of this article is to explore the benefits of adaptive gains in back propagation networks. First we show that gradient descent with respect to gain greatly increases learning speed by amplifying those directions in weight space that are successfully chosen by gradient descent on weights. Adpative gains also allow normalization of weight vectors without loss of computational capacity, and we suggest a simple modification of the learning rule that automatically achieves weight normalization. Finally, we describe a method for creating small hidden layers by making hidden node gains compete according to similarities between nodes, with the goal of improved generalization performance. Simulations show that this competition method is more effective than the special case of gain decay. * In press: IEEE Transactions on Systems, Man and Cybernetics. S...
An Anytime Approach To Connectionist Theory Refinement: Refining The Topologies Of Knowledge-Based Neural Networks
, 1995
"... Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a ta ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a task from a set of its examples. Often times, however, one has additional resources readily available, but largely unused, that can improve the concept that these learning algorithms generate. These resources include available computer cycles, as well as prior knowledge describing what is currently known about the domain. Effective utilization of available computer time is important since for most domains an expert is willing to wait for weeks, or even months, if a learning system can produce an improved concept. Using prior knowledge is important since it can contain information not present in the current set of training examples. In this thesis, I present three "anytime" approaches to connec...
Automated Learning of Load-Balancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = Random(Possible-destinations) IF Load(s) - Reference(s) > q 1 (s) THEN Send RECEIVER-SIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The load-balancing policy considered in this thesis The sender-side rules are applied by the load-balancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters --- d, q 1 , and q 2 --- take non-negative floating-point values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...
A Penalty-Function Approach for Pruning Feedforward Neural Networks
- Neural Computation
, 1994
"... This paper proposes the use of a penalty function for pruning feedforward neural network by weight elimination. The penalty function proposed consists of two terms; the first term is to discourage the use of unnecessary connections and the second term is to prevent the weights of the connections ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper proposes the use of a penalty function for pruning feedforward neural network by weight elimination. The penalty function proposed consists of two terms; the first term is to discourage the use of unnecessary connections and the second term is to prevent the weights of the connections from taking excessively large values. Simple criteria for eliminating weights from the network are also given. The effectiveness of this penalty function is tested on three well known problems. These test problems are the contiguity problem, the parity problems, and the monks problems. The resulting pruned networks obtained for many of these problems have fewer connections than previously reported in the literature. 1 Introduction We are concerned in this paper with finding a minimal feedforward backpropagation neural network for solving the problem of distinguishing patterns from two or more sets in n-dimensional space. Backpropagation feedforward neural networks have been gaining ac...

