Results 1  10
of
57
Representation of Finite State Automata in Recurrent Radial Basis Function Networks
, 1996
"... to :hs paper we propose some techniques ft>r injccling linite Stale automata rate l.ec:rr,zn Radial Basis Functlt>n networks (R2BF). When providing proper hints and constraining the v,oght space prlpe'ly. we show that thc,e nelworks behave as automata. A teebraque is snggcsted /"t ebrorag the lemmn ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
to :hs paper we propose some techniques ft>r injccling linite Stale automata rate l.ec:rr,zn Radial Basis Functlt>n networks (R2BF). When providing proper hints and constraining the v,oght space prlpe'ly. we show that thc,e nelworks behave as automata. A teebraque is snggcsted /"t ebrorag the lemmng process re develop aulomata representationq that is based on adding a pro)per penalty tunelton to the mdinary cost. Successful experinental results are shown for tuducttvc mcrenc.' 1 regular gramrnar Keywords: Attemala, backpropagation t[rough trine, high(rder neural networks, induclix. c reference. learning item hints. radial basis ftlnctions, rectarent radial basra tnnclmns. recurrent netw(>rks 1. introduction The ability (>f learning fiom examples is certainly lhe most appealing l'eature c)f neu ral networks. In the last lw years, several researchers have used conncctontst models for solving different kinds ol probfoms ranging from robot control to pattern recogmtioa Coping wilh optimization of [unctions with several thousands of x, ariablcs s quite common Surprisingly, in many practical cases, global or near global r)ptimization is attained also wth non sophistteated numertcal methods. For example, successlul applications of neural nets fi)r recognition of handwritten characters (le Cun, 189) md for phoncmc discrimination (Waibcl c al., 1989) ave bccn proposed which d() n<,t report serious convergence problems Some attempts to understand the theoretical reasons )r lhc successes and atlures of supervised }earrang schemes have been carried oat which explain when such schemes are likely to succeed in discovering oplmal solutions (Bmnchini cl al.. 1994; Gori & Tesi, 1992; Yu, 192), and to gencrali7c to new examples (Baum & Haussler. 1989L These results give st>me ...
Training Neural Nets with the Reactive Tabu Search
"... In this paper the task of training subsymbolic systems is considered as a combinatorial optimization problem and solved with the heuristic scheme of the Reactive Tabu Search. An iterative optimization process based on a "modified greedy search" component is complemented with a metastrategy to real ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
In this paper the task of training subsymbolic systems is considered as a combinatorial optimization problem and solved with the heuristic scheme of the Reactive Tabu Search. An iterative optimization process based on a "modified greedy search" component is complemented with a metastrategy to realize a discrete dynamical system that discourages limit cycles and the confinement of the search trajectory in a limited portion of the search space. The possible cycles are discouraged by prohibiting (i.e., making tabu) the execution of moves that reverse the ones applied in the most recent part of the search, for a prohibition period that is adapted in an automated way. The confinement is avoided and a proper exploration is obtained by activating a diversification strategy when too many configurations are repeated excessively often. The RTS method is applicable to nondifferentiable functions, it is robust with respect to the random initialization and effective in continuing the search after local minima. Three tests of the technique on feedforward and feedback systems are presented.
OnLine Learning Processes in Artificial Neural Networks
, 1993
"... We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. O ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. Online learning is necessary if not all training patterns are available all the time. This occurs in many applications when the training patterns are drawn from a timedependent environmental distribution. Studying learning in a changing environment, we encounter a conflict between the adaptability and the confidence of the network's representation. Minimization of a criterion incorporating both effects yields an algorithm for online adaptation of the learning parameter. The inherent noise of online learning makes it possible to escape from undesired local minima of the error potential on which the learning rule performs (stochastic) gradient descent. We try to quantify these often made cl...
Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
, 1999
"... This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estima ..."
Abstract

Cited by 27 (15 self)
 Add to MetaCart
This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and convergence characteristics, such as stable learning and robustness to oscillations. Simulations are conducted to compare and evaluate the convergence behavior of these gradientbased training algorithms with several popular training methods.
Exponentially Many Local Minima for Single Neurons
, 1995
"... We show that for a single neuron with the logistic function as the transfer function the number of local minima of the error function based on the square loss can grow exponentially in the dimension. 1 INTRODUCTION Consider a single artificial neuron with d inputs. The neuron has d weights w 2 R d ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
We show that for a single neuron with the logistic function as the transfer function the number of local minima of the error function based on the square loss can grow exponentially in the dimension. 1 INTRODUCTION Consider a single artificial neuron with d inputs. The neuron has d weights w 2 R d . The output of the neuron for an input pattern x 2 R d is y = OE(x \Delta w), where OE : R ! R is a transfer function. For a given sequence of training examples h(x t ; y t )i 1tm ; each consisting of a pattern x t 2 R d and a desired output y t 2 R, the goal of the training phase for neural networks consists of minimizing the error function with respect to the weight vector w 2 R d . This function is the sum of the losses between outputs of the neuron and the desired outputs summed over all training examples. In notation, the error function is E(w) = m X t=1 L(y t ; OE(x t \Delta w)) ; where L : R \Theta R ! [0; 1) is the loss function. Acommon example of a transfer function...
Learning without Local Minima in Radial Basis Function Networks
 IEEE Transactions on Neural Networks
, 1995
"... Learning from examples plays a central role in artificial neural networks (ANN). However, the success of many learning schemes is not guaranteed, since algorithms like Backpropagation (BP) may get stuck in local minima, thus providing suboptimal solutions. For feedforward networks, the theoretical ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Learning from examples plays a central role in artificial neural networks (ANN). However, the success of many learning schemes is not guaranteed, since algorithms like Backpropagation (BP) may get stuck in local minima, thus providing suboptimal solutions. For feedforward networks, the theoretical results reported in [5,6,15,20] show that optimal learning can be achieved provided that certain conditions on the network and the learning environment are met. A similar investigation is put forward in this paper for the case of networks using radial basis functions (RBF) [10,14]. The analysis proposed in [6] is extended naturally under the assumption that the patterns of the learning environment are separable by hyperspheres. In that case, we prove that the attached cost function is local minima free with respect to all the weights. This provides us with some theoretical foundations for a massive application of RBF in pattern recognition. Keywords Backpropagation, multilayered networks...
What size neural network gives optimal generalization? convergence properties of backpropagation
, 1996
"... One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a prespecified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with r ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a prespecified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversized networks can result in lower training and generalization error in certain cases, and for c) the use of committee or ensemble techniques can be more beneficial as the level of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks.
Are Multiplayer Perceptrons Adequate for Pattern Recognition and Verification
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Abstract—This paper discusses the ability of multilayer perceptrons (MLPs) to model the probability distribution of data in typical pattern recognition and verification problems. It is proven that multilayer perceptrons with sigmoidal units and a number of hidden units less or equal than the number ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Abstract—This paper discusses the ability of multilayer perceptrons (MLPs) to model the probability distribution of data in typical pattern recognition and verification problems. It is proven that multilayer perceptrons with sigmoidal units and a number of hidden units less or equal than the number of inputs are unable to model patterns distributed in typical clusters, since these networks draw open separation surfaces in the pattern space. When using more hidden units than inputs, the separation surfaces can be closed but, unfortunately, it is proven that determining whether or not an MLP draws closed separation surfaces in the pattern space is 13hard. The major conclusion of this paper is somewhat opposite to what is believed and reported in many application papers: MLPs are definitely not adequate for applications of pattern recognition requiring a reliable rejection and, especially, they are not adequate for pattern verification tasks. Index Terms—Multilayer perceptrons, pattern recognition, pattern verification, function approximation, closed hemisphere problem. 1
A note on the learning automata based algorithms for adaptive parameter selection in PSO
 Applied Soft Computing
, 2011
"... in PSO ..."
Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks
 IEEE Transactions on Knowledge and Data Engineering
, 1992
"... We propose a novel unified approach for integrating explicit knowledge and learning by example in recurrent networks. The explicit knowledge is represented by automaton rules, which are directly injected into the connections of a network. This can be accomplished by using a technique based on linear ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We propose a novel unified approach for integrating explicit knowledge and learning by example in recurrent networks. The explicit knowledge is represented by automaton rules, which are directly injected into the connections of a network. This can be accomplished by using a technique based on linear programming, instead of learning from random initial weights. Learning is conceived as a refinement process and is mainly responsible of uncertain information management. We present preliminary results for problems of automatic speech recognition. Index Terms  Recurrent neural networks, learning automata, automatic speech recognition. I Introduction The resurgence of interest in connectionist models has led several researchers to investigate their application to the building of "intelligent systems". Unlike symbolic models proposed in artificial intelligence, learning plays a central role in connectionist models. Many successful applications have mainly concerned perceptual tasks (see e....