Results 11  20
of
49
Efficient Training of FeedForward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Fast online policy gradient learning with smd gain vector adaptation
 Advances in Neural Information Processing Systems 18
, 2006
"... Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously illbehaved optimization problems. We improve its robustness and speed of convergence with stochastic metadescent, a gain vector adaptation method that employs fast Hessianvecto ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously illbehaved optimization problems. We improve its robustness and speed of convergence with stochastic metadescent, a gain vector adaptation method that employs fast Hessianvector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods. 1
Online Independent Component Analysis with Local Learning Rate Adaptation
 Neural Information Processing Systems
, 2000
"... Stochastic metadescent (SMD) is a new technique for online adaptation of local learning rates in arbitrary twicedierentiable systems. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Stochastic metadescent (SMD) is a new technique for online adaptation of local learning rates in arbitrary twicedierentiable systems.
Online Step Size Adaptation
 INESC. 9 Rua Alves Redol, 1000
, 1997
"... Subcategory: online learning algorithms ..."
New classes of learning automata based schemes for adaptation backprQpagation algorithm
, 2000
"... Abstract One popular learning algorithm for feedforward neural networks is the back.propagation (BP) algorithm which includes parameters: learning rate (1]), momentum factor (n) and steepness parameter (A.).The appropriate selections of these parameters have a large effect on the convergence of th ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
Abstract One popular learning algorithm for feedforward neural networks is the back.propagation (BP) algorithm which includes parameters: learning rate (1]), momentum factor (n) and steepness parameter (A.).The appropriate selections of these parameters have a large effect on the convergence of the algorithm. Many techniques that adaptively adjust these parameters have been developed to increase speed of convergence. In this paper, we shall present several classes of learning automata based solutions to the problem of adaptation of BP algorithm parameters. By interconnection of learning automata to the feedforward neural networks, we use learning automata schemes for adjusting the parameters 1], n, and A.based on the observation of random response of the neural networks. One of the important aspects o.fproposed scheme is its ability to escape from local minima with high possibility during the training period. The feasibility of the pz:oposedmethods are shown through the simulations on several prOblems.
Implementation and Comparison of Growing Neural Gas, Growing Cell Structures and Fuzzy Artmap
, 1997
"... ..."
Speeding Up Fuzzy Clustering with Neural Network Techniques
 Fuzzy Systems
"... Abstract — We explore how techniques that were developed to improve the training process of artificial neural networks can be used to speed up fuzzy clustering. The basic idea of our approach is to regard the difference between two consecutive steps of the alternating optimization scheme of fuzzy cl ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract — We explore how techniques that were developed to improve the training process of artificial neural networks can be used to speed up fuzzy clustering. The basic idea of our approach is to regard the difference between two consecutive steps of the alternating optimization scheme of fuzzy clustering as providing a gradient, which may be modified in the same way as the gradient of neural network backpropagation is modified in order to improve training. Our experimental results show that some methods actually lead to a considerable acceleration of the clustering process. I.
Gradient Descent: SecondOrder Momentum and Saturating Error
 In (Moody et al
, 1992
"... Batch gradient descent, \Deltaw(t) = \GammajdE=dw(t), converges to a minimum of quadratic form with a time constant no better than 1 4 max= min where min and max are the minimum and maximum eigenvalues of the Hessian matrix of E with respect to w. It was recently shown that adding a momentum ter ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Batch gradient descent, \Deltaw(t) = \GammajdE=dw(t), converges to a minimum of quadratic form with a time constant no better than 1 4 max= min where min and max are the minimum and maximum eigenvalues of the Hessian matrix of E with respect to w. It was recently shown that adding a momentum term \Deltaw(t) = \GammajdE=dw(t) + ff\Deltaw(t \Gamma 1) improves this to 1 4 p max = min , although only in the batch case. Here we show that secondorder momentum, \Deltaw(t) = \GammajdE=dw(t) + ff\Deltaw(t \Gamma 1) + fi \Deltaw(t \Gamma 2), can lower this no further. We then regard gradient descent with momentum as a dynamic system and explore a nonquadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics. 1 INTRODUCTION Gradient descent is the breadandbutter optimization technique in neural networks. Some people build special purpose hardware to accelerate gradient descent optimization...
JETNET 3.0  A Versatile Artificial Neural Network Package
, 1993
"... this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors
Hybrid Decision Tree
, 2002
"... In this paper, a hybrid learning approach named HDT is proposed. HDT simulates human reasoning by using symbolic leaming to do qualitative analysis and using neural leaming to do subsequent quantitative analysis. It generates the trunk of a binary hybrid decision tree according to the binary informa ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In this paper, a hybrid learning approach named HDT is proposed. HDT simulates human reasoning by using symbolic leaming to do qualitative analysis and using neural leaming to do subsequent quantitative analysis. It generates the trunk of a binary hybrid decision tree according to the binary information gain ratio criterion in an instance space defined by only original unordered attributes. If unordered attributes cannot further distinguish training examples falling into a leaf node whose diversity is beyond the diversitythreshold, then the node is marked as a dummy node. After all those dummy nodes are marked, a specific feedforward neural network named Fnqc that is trained in an instance space defined by only original ordered attributes is exploited to accomplish the leaming task. Moreover, this paper distinguishes three kinds of incremental learning tasks. Two incremental leaming procedures designed for exampleincremental leaming with different storage requirements are provided, which enables HDT to deal gracefully with data sets where new data are frequently appended. Also a hypothesisdriven constructive induction mechanism is provided, which enables HDT to generate compact concept descriptions.