Results 1  10
of
24
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 731 (58 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 193 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, InputOutput HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variablelength Markov models and Markov switching statespace models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 61 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Bayesian computation in recurrent neural circuits
 Neural Computation
, 2004
"... A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such models remains largely unclear. In this paper, we show that a network architecture commonly used to model the cerebral cortex can implem ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such models remains largely unclear. In this paper, we show that a network architecture commonly used to model the cerebral cortex can implement Bayesian inference for an arbitrary hidden Markov model. We illustrate the approach using an orientation discrimination task and a visual motion detection task. In the case of orientation discrimination, we show that the model network can infer the posterior distribution over orientations and correctly estimate stimulus orientation in the presence of significant noise. In the case of motion detection, we show that the resulting model network exhibits direction selectivity and correctly computes the posterior probabilities over motion direction and position. When used to solve the wellknown random dots motion discrimination task, the model generates responses that mimic the activities of evidenceaccumulating neurons in cortical areas LIP and FEF. The framework introduced in the paper posits a new interpretation of cortical activities in terms of log posterior probabilities of stimuli occurring in the natural world. 1 1
Diffusion of context and credit information in Markovian models
 Journal of Artificial Intelligence Research
, 1995
"... This paper studies the problem of ergodicity of transition probabilitymatricesinMarkovian models, such as hidden Markov models (HMMs), and how itmakes very di cult the task of learning to represent longterm context for sequential data. This phenomenon hurts the forward propagation of longterm cont ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
This paper studies the problem of ergodicity of transition probabilitymatricesinMarkovian models, such as hidden Markov models (HMMs), and how itmakes very di cult the task of learning to represent longterm context for sequential data. This phenomenon hurts the forward propagation of longterm context information, as well as learning a hidden state representation to represent longterm context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, weshow that this problem of di usion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approachesbasedon continuous optimization, such asgradient descent and the BaumWelch algorithm. 1.
Utterance Verification in Continuous Speech Recognition: Decoding and Training Procedures
 IEEE Trans. Speech Audio Process
, 2000
"... Abstract—This paper introduces a set of acoustic modeling and decoding techniques for utterance verication (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string cor ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract—This paper introduces a set of acoustic modeling and decoding techniques for utterance verication (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string correspond to incorrectly decoded vocabulary words or outofvocabulary words that may appear in an utterance. This capability is implemented here as a likelihood ratio (LR) based hypothesis testing procedure for verifying individual words in a decoded string. There are two UV techniques that are presented here. The first is a procedure for estimating the parameters of UV models during training according to an optimization criterion which is directly related to the LR measure used in UV. The second technique is a speech recognition decoding procedure where the “best ” decoded path is defined to be that which optimizes a LR criterion. These techniques were evaluated in terms of their ability to improve UV performance on a speech dialog task over the public switched telephone network. The results of an experimental study presented in the paper shows that LR based parameter estimation results in a significant improvement in UV performance for this task. The study also found that the use of the LR based decoding procedure, when used in conjunction with models trained using the LR criterion, can provide as much as an 11 % improvement in UV performance when compared to existing UV procedures. Finally, it was also found that the performance of the LR decoder was highly dependent on the use of the LR criterion in training acoustic models. Several observations are made in the paper concerning the formation of confidence measures for UV and the interaction of these techniques with statistical language models used in ASR. Index Terms—Acoustic modeling, confidence measures, discriminative training, large vocabulary continuous speech recognition, likelihood ratio, utterance verification. I.
Generalization And Maximum Likelihood From Small Data Sets
 Proc. IEEESP Workshop on Neural Networks for Signal Processing
, 1993
"... INTRODUCTION An often encountered learning problem is maximum likelihood training of exponential models. When the state is only partially specified by the training data, iterative training algorithms are used to produce a sequence of models that assign increasing likelihood to the training data. Al ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
INTRODUCTION An often encountered learning problem is maximum likelihood training of exponential models. When the state is only partially specified by the training data, iterative training algorithms are used to produce a sequence of models that assign increasing likelihood to the training data. Although the performance as measured on the training set continues to improve as the algorithms progress, performance on related data sets may eventually begin to deteriorate. The cause of this behavior can be seen when the training problem is stated in the Alternating Minimization framework [1]. A modified maximum likelihood training criterion is suggested to counter this behavior. It leads to a simple modification of the learning algorithms which relates generalization to learning speed. Training Boltzmann Machines [2] and Hidden Markov Models [3, 4, 5, 6] is discussed under this modified criterion. PROBLEM STATEMENT A detailed presentation of this material is avail
Probability Estimation By FeedForward Networks In Continuous Speech Recognition
 In Proceedings IEEE Workshop on Neural Networks for Signal Processing
, 1991
"... We review the use of feedforward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF ne ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We review the use of feedforward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF networks are trained to estimate posteriors rather than the likelihoods estimated by tied mixture density estimators. We show how the neural network training should be modified to resolve this mismatch. We also discuss problems with discriminative training, particularly the problem of dealing with unlabelled training data and the mismatch between model and data priors. L&H Speechproducts, Ieper, B8900, Belgium ii INTRODUCTION In continuous speech recognition we wish to estimate P(W W 1 jX T 1 , M), the posterior probability of a word sequence W W 1 = w 1 , ..., wW given the acoustic evidence X T 1 = x 1 , ..., x T and the parameters of the models used Q. This probability canno...
Hidden Neural Networks
 NEURAL COMPUTATION
, 1997
"... A general framework for hybrids of Hidden Markov models (HMMs) and neural networks (NNs) called Hidden Neural Networks (HNNs) is described. The paper begins by reviewing standard HMMs and estimation by conditional maximum likelihood, which is used by the HNN. In the HNN the usual HMM probability par ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
A general framework for hybrids of Hidden Markov models (HMMs) and neural networks (NNs) called Hidden Neural Networks (HNNs) is described. The paper begins by reviewing standard HMMs and estimation by conditional maximum likelihood, which is used by the HNN. In the HNN the usual HMM probability parameters are replaced by the outputs of state specific neural networks. As opposed to many other hybrids, the HNN is normalized globally and therefore has a valid probabilistic interpretation. All parameters in the HNN are estimated simultaneously according to the discriminative conditional maximum likelihood criterion. An evaluation of the HNN on the task of recognizing broad phoneme classes in the TIMIT database shows clear performance gains compared to standard HMMs tested on the same task.