Results 1  10
of
28
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 193 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Input/output hmms for sequence processing
 IEEE Transactions on Neural Networks
, 1996
"... We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation ..."
Abstract

Cited by 98 (12 self)
 Add to MetaCart
We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call Input/Output Hidden Markov Model (IOHMM). It can be trained by the EM or GEM algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
A Possibility for Implementing Curiosity and Boredom in ModelBuilding Neural Controllers
, 1991
"... This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed online learning. First an online reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `selfsupervised' continu ..."
Abstract

Cited by 77 (22 self)
 Add to MetaCart
This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed online learning. First an online reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `selfsupervised' continually running networks which learn in parallel. One of the networks learns to represent a complete model of the environmental dynamics and is called the `model network'. It provides complete `credit assignment paths' into the past for the second network which controls the animats physical actions in a possibly reactive environment. The animats goal is to maximize cumulative reinforcement and minimize cumulative `pain'. The algorithm has properties which allow to implement something like the desire to improve the model network's knowledge about the world. This is related to curiosity. It is described how the particular algorithm (as well as similar modelbuilding algorithms) may be augmented ...
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 61 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
An Experimental Comparison of Recurrent Neural Networks
 In Advances in Neural Information Processing Systems 7
, 1995
"... Many different discretetime recurrent neural network architectures have been proposed. However, there has been virtually no effort to compare these architectures experimentally. In this paper we review and categorize many of these architectures and compare how they perform on various classes of si ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
Many different discretetime recurrent neural network architectures have been proposed. However, there has been virtually no effort to compare these architectures experimentally. In this paper we review and categorize many of these architectures and compare how they perform on various classes of simple problems including grammatical inference and nonlinear system identification. 1 Introduction In the past few years several recurrent neural network architectures have emerged. In this paper we categorize various discretetime recurrent neural network architectures, and perform a quantitative comparison of these architectures on two problems: grammatical inference and nonlinear system identification. 2 RNN Architectures We broadly divide these networks into two groups depending on whether or not the states of the network are guaranteed to be observable. A network with observable Also with UMIACS, University of Maryland, College Park, MD 20742 y Published in Neural Information Pr...
For Neural Networks, Function Determines Form
, 1992
"... This paper shows that the weights of continuoustime feedback neural networks are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function oe; if the two n ..."
Abstract

Cited by 31 (14 self)
 Add to MetaCart
This paper shows that the weights of continuoustime feedback neural networks are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function oe; if the two nets have equal behaviors as "black boxes" then necessarily they must have the same number of neurons and except at most for sign reversals at each node the same weights. Moreover, even if the activations are not a priori known to coincide, they are shown to be also essentially determined from the external measurements. Key words: Neural networks, identification from input/output data, control systems 1 Introduction Many recent papers have explored the computational and dynamical properties of systems of interconnected "neurons." For instance, Hopfield ([7]), Cowan ([4]), and Grossberg and his school (see e.g. [3]), have all studied devices that can be modelled by sets of nonlinear dif...
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their stateoftheart performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NNHMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NNHMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
 Connection Science
, 1989
"... Most known learning algorithms for dynamic neural networks in nonstationary environments need global computations to perform credit assignment. These algorithms either are not local in time or not local in space. Those algorithms which are local in both time and space usually can not deal sensibly ..."
Abstract

Cited by 27 (17 self)
 Add to MetaCart
Most known learning algorithms for dynamic neural networks in nonstationary environments need global computations to perform credit assignment. These algorithms either are not local in time or not local in space. Those algorithms which are local in both time and space usually can not deal sensibly with `hidden units'. In contrast, as far as we can judge by now, learning rules in biological systems with many `hidden units' are local in both space and time. In this paper we propose a parallel online learning algorithm which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are `hidden in time'. The approach is inspired by Holland's idea of the bucket brigade for classifier systems, which is transformed to run on a neural network with fixed topology. The result is a feedforward or recurrent `neural' dissipative system which is consuming `weightsubstance' and permanently trying to distribute this substance onto its co...