Results 1  10
of
19
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, InputOutput HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variablelength Markov models and Markov switching statespace models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Global Optimization of a Neural Network  Hidden Markov Model Hybrid
 IEEE Transactions on Neural Networks
, 1991
"... In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach descr ..."
Abstract

Cited by 70 (16 self)
 Add to MetaCart
In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described here, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speakerindependent recognition experiments using this integrated ANNHMM system on the TIMIT continuous speech database are reported. 1 Introduction In spite of the fact that speech exhibits features that cannot be represented by a firstorder Markov model, Hidden Markov Models (HMMs) of speech units (e.g., phonemes) have been used with a good degree of success in Automatic Speech Recognition (ASR) (Rabiner & Levinson 85; Lee & Hon 89). Artificial Neural Networks (ANNs) have proven to be useful for classifying speech prop...
Low Entropy Coding with Unsupervised Neural Networks
"... ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact image representations. Keywords: neural networks, unsupervised learning, selforganisation, feature extraction, information theory, redundancy reduction, sparse coding, imaging models, occlusion, image coding, speech coding. Declaration This dissertation is the result of my own original work, except where reference is made to the work of others. No part of it has been submitted for any other university degree or diploma. Its length, including captions, footnotes, appendix and bibliography, is approximately 58000 words. Acknowledgements I would like first and foremost to thank Richard Prager, my supervisor, fo
SegmentBased Stochastic Models Of Spectral Dynamics For Continuous Speech Recognition
, 1992
"... This dissertation addresses the problem of modeling the joint timespectral structure of speech for recognition. Four areas are covered in this work: segment modeling, estimation, recognition search algorithms, and extension to a more general class of models. A unified view of the acoustic models th ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
This dissertation addresses the problem of modeling the joint timespectral structure of speech for recognition. Four areas are covered in this work: segment modeling, estimation, recognition search algorithms, and extension to a more general class of models. A unified view of the acoustic models that are currently used in speech recognition is presented; the research is then focused on segmentbased models that provide a better framework for modeling the intrasegmental statistical dependencies than the conventional hidden Markov models (HMMs). The validity of a linearity assumption for modeling the intrasegmental statistical dependencies is first checked, and it is shown that the basic assumption of conditionally independent observations given the underlying state sequence that is inherent to HMMs is inaccurate. Based on these results, linear models are chosen for the distribution of the observations within a segment of speech. Motivated by the original work of the stochastic segment model, a dynamical system segment model is proposed for continuous speech recognition. Training of this model is equivalent to the maximum likelihood identification of a stochastic linear system, and a simple alternative to the traditional approach is developed. This procedure is based on the ExpectationMaximization algorithm and is analogous to the BaumWelch algorithm for HMMs, since the dynamical system segment model can be thought of as a continuous state vii HMM. Recognition involves computing the probability of the innovations given by Kalman filtering. The large computational complexity of segmentbased models is dealt with by the introduction of fast recognition search algorithms as alternatives to the typical Dynamic Programming search. A SplitandMerge segmentation algorithm is...
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Learning Out Of Time Series With An Extended Recurrent Neural Network
 In Proceedings of the IEEE Neural Network Workshop for Signal Processing
, 1996
"... In this paper an extension to a regular recurrent neural network (ERNN) is presented. It allows to train the ERNN without the limitation of using input information just up to a preset future frame. It is possible to train the ERNN simultanously in positive and negative time direction, leading in reg ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
In this paper an extension to a regular recurrent neural network (ERNN) is presented. It allows to train the ERNN without the limitation of using input information just up to a preset future frame. It is possible to train the ERNN simultanously in positive and negative time direction, leading in regression and classification experiments to results better than merging the outputs of separate networks trained in positive and negative time direction alone. The network structure is designed to be trained at least with any form of backpropagation through time. Structure and training procedure of the proposed network are explained. Results for classification experiments with an ERNN trained as a classifier and regression experiments with an ERNN trained to minimize the mean squared error on artificial data are reported and compared with previous approaches using merged outputs of regular RNNs. For real data, a classification experiment for speech feature vectors to phone classes is reported....
Data Selection and Model Combination in Connectionist Speech Recognition
, 1997
"... nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classificatio ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classification performance. The application of boosting to acoustic modelling has been investigated, and a modified boosting procedure developed. The boosting algorithms have been applied to multilayer perceptron acoustic models, and performance of the models assessed on a number of ARPA benchmark tasks. The results show that boosting consistently provides a 1419% reduction in word error rate. The standard boosting techniques are not suitable for use with recurrent network acoustic models, and three new boosting algorithms have been developed for use with connectionist models with internal memory. These new boosting algorithms have also been evaluated on a number of ARPA benchmark tasks, and have been
The Use Of Recurrent Neural Networks For Classification
 IEEE Workshop on Neural Networks for Signal Processing IV
, 1994
"... Recurrent neural networks are widely used for context dependent pattern classification tasks such as speech recognition. The feedback in these networks is generally claimed to contribute to integrating the context of the input feature vector to be classified. This paper analyses the use of recurrent ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Recurrent neural networks are widely used for context dependent pattern classification tasks such as speech recognition. The feedback in these networks is generally claimed to contribute to integrating the context of the input feature vector to be classified. This paper analyses the use of recurrent neural networks for such applications. We show that the contribution of the feedback connections is primarily a smoothing mechanism and that this is achieved by moving the class boundary of an equivalent feedforward network classifier. We also show that when the sigmoidal hidden nodes of the network operate close to saturation, switching from one class to the next is delayed, and within a class the network decisions are insensitive to the order of presentation of the input vectors. INTRODUCTION Many classification problems depend on the context in which class data is received, ie. the history of previous classes. Human perception of speech is a typical example, in which coarticulation eff...
BiDirectional Recurrent Neural Networks For Speech Recognition
, 1996
"... While many possible network architectures have been used to estimate conditional probabilities of class membership, recurrent neural networks (RNNs) have been most successful for speech recognition. In the past optimal results were achieved by merging the outputs of two RNNs trained in each time dir ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
While many possible network architectures have been used to estimate conditional probabilities of class membership, recurrent neural networks (RNNs) have been most successful for speech recognition. In the past optimal results were achieved by merging the outputs of two RNNs trained in each time direction. Merging outputs of different experts to form one resulting opinion is theoretically difficult  a direct combination of the experts during training would be desirable. This paper presents a bidirectional neural network (BRNN) structure which can be trained in both time directions simultanously and hence avoids the difficult merging process. 1. INTRODUCTION Almost all current large vocabulary speech recognition systems are based on Hidden Markov Models (HMMs) with parametric observation density distributions. As the kernels for the distributions usually Gaussians are chosen. The parameters of the distributions can easily be estimated with maximum likelihood methods. For optimal resu...