Results 1 - 10
of
14
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Input/output hmms for sequence processing
- IEEE Transactions on Neural Networks
, 1996
"... We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation ..."
Abstract
-
Cited by 82 (12 self)
- Add to MetaCart
We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call Input/Output Hidden Markov Model (IOHMM). It can be trained by the EM or GEM algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input se-quences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
Global Optimization of a Neural Network - Hidden Markov Model Hybrid
- IEEE Transactions on Neural Networks
, 1991
"... In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach descr ..."
Abstract
-
Cited by 63 (16 self)
- Add to MetaCart
In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described here, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. 1 Introduction In spite of the fact that speech exhibits features that cannot be represented by a first-order Markov model, Hidden Markov Models (HMMs) of speech units (e.g., phonemes) have been used with a good degree of success in Automatic Speech Recognition (ASR) (Rabiner & Levinson 85; Lee & Hon 89). Artificial Neural Networks (ANNs) have proven to be useful for classifying speech prop...
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Phoneme Probability Estimation with Dynamic Sparsely Connected Artificial Neural Networks
, 1997
"... This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connec ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connected recurrent network grows super-linear with the number of hidden units, schemes for sparse connection and connection pruning are explored. It is found that sparsely connected networks outperform their fully connected counterparts with an equal number of connections. The implementation of the combined architecture and training scheme is described in detail. The networks are evaluated in a hybrid HMM/ANN system for phoneme recognition on the TIMIT database, and for word recognition on the WAXHOLM database. The achieved phone error-rate, 27.8%, for the standard 39 phoneme set on the core test-set of the TIMIT database is in the range of the lowest reported. All training and simulation softwar...
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Automatic Continuous Speech Recognition with Rapid Speaker Adaption for Human/Machine Interaction
, 1997
"... This thesis presents work in three main directions of the automatic speech recognition field. The work within two of these -- dynamic decoding and hybrid HMM/ANN speech recognition -- has resulted in a real-time speech recognition system, currently in use in the human/machine dialogue demonstra ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This thesis presents work in three main directions of the automatic speech recognition field. The work within two of these -- dynamic decoding and hybrid HMM/ANN speech recognition -- has resulted in a real-time speech recognition system, currently in use in the human/machine dialogue demonstration system WAXHOLM, developed at the department. The third direction is fast unsupervised speaker adaptation, where "fast" refers to adaptation with a small amount of adaptation speech. The work in
Continuous Speech Recognition in the WAXHOLM Dialogue System
, 1996
"... This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for research purposes, a general mode where the input consists of a phone-graph also exists. A lexicon with multiple pronunciations for many words and a class bigram-grammar is used. The lexicon and grammar constraints are represented by a lexical graph, optimised for efficient lexical decoding. The decoding is performed in a two-pass search. The first pass is a Viterbi beam-search and the second is an A* stackdecoding search. Pruning-strategies and memory management in the two passes are discussed in the report. Several different output formats are available. Results can be reported either on the word or phoneme level with or without the time alignment information. Multiple hypotheses can be output either as standard Nbest lists or in a more compact word-graph format. Continuous speech recognition can be performed on a standard UNIX workstation in real-time with a lexicon of about 1000 words.
A Hybrid Stochastic Connectionist Approach to Automatic Speech Recognition
"... This report focuses on a hybrid approach, including stochastic and connectionist methods, for continuous speech recognition. Hidden Markov Models (HMMs) are a popular stochastic approach used for continuous speech, well suited to cope with the high variability found in natural utterances. On the oth ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report focuses on a hybrid approach, including stochastic and connectionist methods, for continuous speech recognition. Hidden Markov Models (HMMs) are a popular stochastic approach used for continuous speech, well suited to cope with the high variability found in natural utterances. On the other hand, artificial neural networks (NNs) have shown high classification power for short speech utterances. Therefore, we have built a hybrid system with the advantage of both Hidden Markov Models and Neural Networks. The basic idea is as follows: build a codebook from the Time-Delay Neural Networks (TDNN) output units and train HMMs using the Fuzzy-VQ algorithm. We trained several discrete HMMs for the recognition task of the Japanese phonemes using just one TDNN-generated codebook. We achieved a recognition rate of 96.1%, and in so doing, increased the recognition rate of the discrete HMMs by 7.1%. The results are an obvious proof of the possible collaboration of two different systems aime...
A Comparative Study On Hybrid Acoustic Phonetic Decoders Based On Artificial Neural Networks
, 1991
"... In this paper we compare two hybrid acoustic-phonetic decoders based on Artificial Neural Networks (ANN). We evaluate them on the task of recognizing stop phones in continuous speech independently from the speaker. ANNs are well suited to perform detailed phonetic distinctions. In general, technique ..."
Abstract
- Add to MetaCart
In this paper we compare two hybrid acoustic-phonetic decoders based on Artificial Neural Networks (ANN). We evaluate them on the task of recognizing stop phones in continuous speech independently from the speaker. ANNs are well suited to perform detailed phonetic distinctions. In general, techniques based on Dynamic Programming (DP), in particular Hidden Markov Models (HMMs), have proven to be successful at modeling the temporal structure of the speech signal. In the approach described here, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters of the ANN/HMM decoder. Comparative experiments using this ANN/HMM hybrid decoder and another ANN-DP hybrid are reported for the TIMIT database. 1 Introduction Artificial Neural Networks (ANNs) effectively perform phonetic classification, but have not proven yet to model the temporal structure of the speech signal reasonably well [Lip89, Rob90, Ben90b...

