Results 21 - 30
of
66
Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
, 1999
"... This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estima ..."
Abstract
-
Cited by 19 (13 self)
- Add to MetaCart
This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and convergence characteristics, such as stable learning and robustness to oscillations. Simulations are conducted to compare and evaluate the convergence behavior of these gradient-based training algorithms with several popular training methods.
Transforming Neural-Net Output Levels to Probability Distributions
- Advances in Neural Information Processing Systems 3
, 1991
"... : (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
: (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for instance by classical statistical estimation techniques. (2) We find that in cases of interest, neural networks are (and should be) somewhat underdetermined because the training data is always limited in quality and quantity. We present a method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data. It is particularly useful to combine these two ideas: we implement the ideas of section 1 using Parzen windows, where the shape and relative size of each window is computed using the ideas of section 2. This allows us to make contact between important theoretical ideas (e.g. the ensemble form...
The Cascade-Correlation Learning Architecture
- Advances in Neural Information Processing Systems 2
, 1990
"... Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creatin ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network. This research was sponsored in part by the National Science Foundation under Contract Number EET-87163...
Sammon’s mapping using neural networks: a comparison
- Pattern Recognition Letters
, 1997
"... A well-known procedure for mapping data from a high-dimensional space onto a lower-dimensional one is Sammon’s mapping. This algorithm preserves as well as possible all inter-pattern distances. A major disadvantage of the original algorithm lies in the fact that it is not easy to map hitherto unseen ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
A well-known procedure for mapping data from a high-dimensional space onto a lower-dimensional one is Sammon’s mapping. This algorithm preserves as well as possible all inter-pattern distances. A major disadvantage of the original algorithm lies in the fact that it is not easy to map hitherto unseen points. To overcome this problem, several methods have been proposed. In this paper, we aim to compare some approaches to implement this mapping on a neural network. q 1997
Online Learning and Stochastic Approximations
, 1998
"... The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples. The stochastic approximation theory then provides general results describing the convergence of all these learning algorithms at once. 1 Introduction Almost all of the early work on Learning Systems focused on online algorithms (Hebb, 1949) (Rosenblatt, 1957) (Widrow and Hoff, 1960) (Amari, 1967) (Kohonen, 1982). In these early days, the algorithmic simplicity of online algorithms was a requirement. This is still the case when it comes to handling large, real-life training sets (LeCun et al., 1989) (Muller, Gunzinger and Guggenbuhl, 1995). The early Recursive Adaptive Algorithms were introduced during the same years (Robbins and Monro, 1951) and v...
Shared Weights Neural Networks in Image Analysis
, 1996
"... This thesis is concerned with the use of shared weights neural networks in image analysis. This type of neural network has been extensively described in literature since 1989. It is believed that networks incorporating shared weights are capable of local, shift-invariant feature extraction due to th ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
This thesis is concerned with the use of shared weights neural networks in image analysis. This type of neural network has been extensively described in literature since 1989. It is believed that networks incorporating shared weights are capable of local, shift-invariant feature extraction due to the restrictions placed on their architecture. The first experiments were focused mainly on the neural network architectures as suggested by, amongst others, Le Cun et al. [LBD + 89, LBD + 90, LJB + 89] and Viennet [Vie93]. These architectures basically are back-propagation neural networks. However, they restrain the number of free parameters and introduce the notion of receptive fields, combining local information into more abstract patterns at a higher level. Three of these networks were tested on the problem of handwritten digit recognition and the results were compared to those of methods based on other feature extraction or classification techniques. As an intermezzo, a second order...
An EM Approach to Learning Sequential Behavior
, 1994
"... We consider problems of sequence processing and we propose a solution based on a discrete state model. We introduce a recurrent architecture having a modular structure that allocates subnetworks to discrete states. Different subnetworks are model the dynamics (state transition) and the output of the ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We consider problems of sequence processing and we propose a solution based on a discrete state model. We introduce a recurrent architecture having a modular structure that allocates subnetworks to discrete states. Different subnetworks are model the dynamics (state transition) and the output of the model, conditional on the previous state and an external input. The model has a statistical interpretation and can be trained by the EM or GEM algorithms, considering state trajectories as missing data. This allows to decouple temporal credit assignment and actual parameters estimation. The model presents similarities to hidden Markov models, but allows to map input sequences to output sequences, using the same processing style of recurrent networks. For this reason we call it Input/Output HMM (IOHMM). Another remarkable difference is that IOHMMs are trained using a supervised learning paradigm (while potentially taking advantage of the EM algorithm), whereas standard HMMs are trained by an...
Variance Analysis of Sensitivity Information for Pruning Multilayer Feedforward Neural Networks
, 1999
"... This paper presents an algorithm to prune feedforward neural network architectures using sensitivity analysis. Sensitivity Analysis is used to quantify the relevance of input and hidden units. A new statistical pruning heuristic is proposed, based on variance analysis, to decide which units to prune ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents an algorithm to prune feedforward neural network architectures using sensitivity analysis. Sensitivity Analysis is used to quantify the relevance of input and hidden units. A new statistical pruning heuristic is proposed, based on variance analysis, to decide which units to prune. Results are presented to show that the pruning algorithm correctly prunes irrelevant input and hidden units.
Constructive Learning Techniques for Designing Neural Network Systems
, 1997
"... Contents 1. Introduction. 2. Classification. 2.1 Introduction. 2.2 The Pocket algorithm. 2.3 Tower and Cascade architectures. 2.4 Tree architectures: the Upstart algorithm. 2.5 Constructing tree and cascade architectures using dichotomies. 2.6 Constructing neural networks with a single hidden layer. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Contents 1. Introduction. 2. Classification. 2.1 Introduction. 2.2 The Pocket algorithm. 2.3 Tower and Cascade architectures. 2.4 Tree architectures: the Upstart algorithm. 2.5 Constructing tree and cascade architectures using dichotomies. 2.6 Constructing neural networks with a single hidden layer. 2.7 Summary. 3. Regression. 3.1 Introduction. 3.2 The Cascade Correlation Algorithm. 3.3 Node creation and node splitting algorithms. 3.4 Constructing RBF networks. 3.5 Summary. 4. Constructing Modular Architectures. 4.1 Introduction. 4.2 Neural Decision Trees. 4.3 Other approaches to constructing modular networks. 5. Reducing Network Complexity. 5.1 Introduction. 5.2 Pruning Procedures. 5.3 Summary. 6. Conclusion. 7. Appendix: algorithms for single-node learning. 1 1 Introduction Neural networks have been applied to a wide range of application domains such as control, telecommun
Supervised Learning of Conditional Approach: A Case Study
- University of Sussex, UK
, 1993
"... Reinforcement learning regimes have been shown to be capable of learning animat behaviours such as `obstacle avoidance' and `wall following '. Such behaviours can usually be learned more quickly using ordinary supervised methods, since in this case the learner receives more direct feedback. However, ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Reinforcement learning regimes have been shown to be capable of learning animat behaviours such as `obstacle avoidance' and `wall following '. Such behaviours can usually be learned more quickly using ordinary supervised methods, since in this case the learner receives more direct feedback. However, `conditional approach' behaviour (move in on small objects but stand clear of large ones) seems to be hard to learn even by neural network learning methods such as backpropagation. The paper presents the results of a study which investigated this behaviour and shows how the `hardness' of the behaviour can be accounted for in statistical terms. 1 Introduction Recently there has been increasing interest in the use of learning for the automatic acquisition of animat behaviors (e.g., [1,2]). Attention usually focusses on reinforcement methods. However, ordinary supervised methods can be used provided there is some method available for generating suitable training sets. These can be expected to...

