Results 1 - 10
of
42
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Global Optimization of a Neural Network - Hidden Markov Model Hybrid
- IEEE Transactions on Neural Networks
, 1991
"... In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach descr ..."
Abstract
-
Cited by 63 (16 self)
- Add to MetaCart
In this paper an original method for integrating Artificial Neural Networks (ANN) with Hidden Markov Models (HMM) is proposed. ANNs are suitable to perform phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described here, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. 1 Introduction In spite of the fact that speech exhibits features that cannot be represented by a first-order Markov model, Hidden Markov Models (HMMs) of speech units (e.g., phonemes) have been used with a good degree of success in Automatic Speech Recognition (ASR) (Rabiner & Levinson 85; Lee & Hon 89). Artificial Neural Networks (ANNs) have proven to be useful for classifying speech prop...
Lexical Modeling in a Speaker Independent Speech Understanding System
, 1993
"... Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even la ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even larger systems, capable of recognizing 20,000 words are just now being developed. Speech understanding systems have recently been developed that perform fairly well within a restricted domain. While the size and performance of modern speech recognition and understanding systems are impressive, it is evident to anyone who has used these systems that the technology is primitive compared to our own human ability to understand speech. Some of the difficulties hampering progress in the fields of speech recognition and understanding stem from the many sources of variation that occur during human communication. One of the sources of variation that occurs in human communication is the different ways that words can be pronounced. There are many causes of pronunciation variation, such as: the phonetic environment in which the word occurs, the dialect of the speaker,
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Experimental Determination of Precision Requirements for Back-Propagation Training of Artificial Neural Networks
- In Proceedings 2nd International Conference on Microelectronics for Neural Networks
, 1991
"... The impact of reduced weight and output precision on the back-propagation training algorithm [Wer74, RHW86] is experimentally determined for a feed-forward multilayer perceptron. In contrast with previous such studies, the network is large with over 20,000 weights, and is trained with a large, re ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
The impact of reduced weight and output precision on the back-propagation training algorithm [Wer74, RHW86] is experimentally determined for a feed-forward multilayer perceptron. In contrast with previous such studies, the network is large with over 20,000 weights, and is trained with a large, real-world data set of over 130,000 patterns to perform a difficult task, that of phoneme classification for a continuous speech recognition system. The results indicate that 16b weight values are sufficient to achieve training and classification results comparable to 32b floating point, provided that weight and bias values are scaled separately, and that rounding rather than truncation is employed to reduce the precision of intermediary values. Output precision can be reduced to 8 bits without significant effects on performance. 1 Introduction Research into artificial neural networks (ANNs) is hindered by their extensive computational demands. However, these algorithms exhibit massiv...
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Hybrid Neural Network/hidden Markov Model Continuous-Speech Recognition
"... n M In this paper we present a hybrid multilayer perceptron (MLP)/hidde arkov model (HMM) speaker-independent continuous-speech recognib tion system, in which the advantages of both approaches are combined y using MLPs to estimate the state-dependent observation probabilities p of an HMM. New MLP a ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
n M In this paper we present a hybrid multilayer perceptron (MLP)/hidde arkov model (HMM) speaker-independent continuous-speech recognib tion system, in which the advantages of both approaches are combined y using MLPs to estimate the state-dependent observation probabilities p of an HMM. New MLP architectures and training procedures are resented which allow the modeling of multiple distributions for phonetic a p classes and context-dependent phonetic classes. Comparisons with ure HMM system illustrate advantages of the hybrid approach both in terms of recognition accuracy and number of parameters required. 1. INTRODUCTION - o Hidden Markov models (HMMs) are used in most current state f-the-art continuous-speech recognition systems. This approach is u limited by the need for strong statistical assumptions that are nlikely to be valid for speech. Techniques using multilayer peri ceptrons (MLPs) for probability estimation have recently been ntroduced [1] which reduce the assumption o...
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....

