Results 11  20
of
362
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Mean Field Theory for Sigmoid Belief Networks
 Journal of Artificial Intelligence Research
, 1996
"... We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. ..."
Abstract

Cited by 116 (12 self)
 Add to MetaCart
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics.
Information Geometry of the EM and em Algorithms for Neural Networks
 Neural Networks
, 1995
"... In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the obs ..."
Abstract

Cited by 101 (8 self)
 Add to MetaCart
In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified inputoutput data based on the stochastic model. Two algorithms, the EM  and emalgorithms, have so far been proposed for this purpose. The EMalgorithm is an iterative statistical technique of using the conditional expectation, and the emalgorithm is a geometrical one given by information geometry. The emalgorithm minimizes iteratively the KullbackLeibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by forcussing on the EM and em algorithms, and proves a condition which guarantees their equ...
A Statistical Learning Method for Logic Programs with Distribution Semantics
 IN PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LOGIC PROGRAMMING (ICLP’95
, 1995
"... When a joint distribution PF is given to a set F of facts in a logic program DB = F U R where R is a set of rules, we can further extend it to a joint distribution PDB over the set of possible least models of DB. We then define the semantics of DB with the associated distribution PF as PDB, and call ..."
Abstract

Cited by 95 (23 self)
 Add to MetaCart
When a joint distribution PF is given to a set F of facts in a logic program DB = F U R where R is a set of rules, we can further extend it to a joint distribution PDB over the set of possible least models of DB. We then define the semantics of DB with the associated distribution PF as PDB, and call it distribution semantics. While the
Biologically Plausible Errordriven Learning using Local Activation Differences: The Generalized Recirculation Algorithm
 NEURAL COMPUTATION
, 1996
"... The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activationbased variables. A version of BP that can be computed locally using bidirectional activation recirculation (Hinton & McClelland, 1988) instead of ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activationbased variables. A version of BP that can be computed locally using bidirectional activation recirculation (Hinton & McClelland, 1988) instead of backpropagated error derivatives is more biologically plausible. This paper presents a generalized version of the recirculation algorithm (GeneRec), which overcomes several limitations of the earlier algorithm by using a generic recurrent network with sigmoidal units that can learn arbitrary input/output mappings. However, the contrastiveHebbian learning algorithm (CHL, a.k.a. DBM or mean field learning) also uses local variables to perform errordriven learning in a sigmoidal recurrent network. CHL was derived in a stochastic framework (the Boltzmann machine), but has been extended to the deterministic case in various ways, all of which rely on problematic approximationsand assumptions, le...
Interactions Between Frontal Cortex and Basal Ganglia in Working Memory: A Computational Model
, 2000
"... The frontal cortex and basal ganglia interact via a relatively wellunderstood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans  the basal ganglia detect ap ..."
Abstract

Cited by 91 (16 self)
 Add to MetaCart
The frontal cortex and basal ganglia interact via a relatively wellunderstood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans  the basal ganglia detect appropriate contexts for performing motor actions, and enable the frontal cortex to execute such actions at the appropriate time. We build on this idea in the domain of working memory through the use of computational neural network models of this circuit. In our model, the frontal cortex exhibits robust active maintenance, while the basal ganglia contribute a selective, dynamic gating function that enables frontal memory representations to be rapidly updated in a taskrelevant manner. We apply the model to a novel version of the continuous performance task (CPT) that requires subroutinelike selective working memory updating, and compare and contrast our model with other existing models and th...
A retrieval theory of priming in memory
 Psychological Review
, 1988
"... We present a theory of priming that is designed to account for phenomena usually attributed to the action of a spreading activation process. The theory assumes that a prime and target are combined at retrieval into a compound cue that is used to access memory. If the representations of the prime and ..."
Abstract

Cited by 87 (16 self)
 Add to MetaCart
We present a theory of priming that is designed to account for phenomena usually attributed to the action of a spreading activation process. The theory assumes that a prime and target are combined at retrieval into a compound cue that is used to access memory. If the representations of the prime and target are associated in memory, the match is greater than if they are not associated, and this greater match facilitates the response to the target. The compound cue mechanism can be implemented within the framework of several memory models; descriptions of these implementations are presented. We summarize empirical results that have been taken as evidence for a spreading activation process and show that the retrieval theory can also account for these phenomena and that, in some cases, the retrieval theory provides predictions that are more constrained than those provided by spreading activation theories. Also, two experiments are reported that address predictions about the range of priming (in terms of number of connected concepts) and the decay rate of priming (in terms of intervening items). In both eases, the retrieval theory provides a better account of the data than spreading activation. Finally, contrasts between the compound cue theory and longterm priming phenomena are presented. Because the amount of information stored in human memory
Spiking Boltzmann machines
 In Advances in Neural Information Processing Systems
, 1998
"... A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algor ..."
Abstract

Cited by 85 (14 self)
 Add to MetaCart
A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algorithm is very slow in networks with many layers of feature detectors, but it can be made much faster by learning one layer of feature detectors at a time. Boltzmann machines are used to solve two quite different computational problems. For a search problem, the weights on the connections are fixed and are used to represent the cost function of an optimization problem. The stochastic dynamics of a Boltzmann machine then allow it to sample binary state vectors that represent good solutions to the optimization problem. For a learning problem, the Boltzmann machine is shown a set of binary data vectors and it must find weights on the connections so that the data vectors are good solutions to the optimization problem defined by those weights. To solve a learning problem, Boltzmann machines make many small updates to their weights, and each update requires them to solve many different search problems. The stochastic dynamics of a Boltzmann machine When unit i is given the opportunity to update its binary state, it first computes its total input, zi, which is the sum of its own bias, bi, and the weights on connections coming from other active units: zi = bi + �
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.