Results 1 -
9 of
9
Shared-Distribution Hidden Markov Models for Speech Recognition
, 1991
"... Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generaliz ..."
Abstract
-
Cited by 227 (5 self)
- Add to MetaCart
Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shared-distribution model is proposed to replace generalized triphone models for speaker-independent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will inc...
Bayesian Learning for Hidden Markov Model with Gaussian Mixture State Observation Densities
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker ..."
Abstract
-
Cited by 32 (16 self)
- Add to MetaCart
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering, and corrective training are given.
Identifying Non-Linguistic Speech Features
- Proc Eurospeech
"... Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications, for example, information centers in public places such as train stations and airports, where the spoken query is to be recogn ..."
Abstract
-
Cited by 24 (13 self)
- Add to MetaCart
Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications, for example, information centers in public places such as train stations and airports, where the spoken query is to be recognized without even prior knowledge of the languagebeing spoken. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions.
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
- Proc. DARPA Speech and Natural Language Workshop
, 1991
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker cl ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given. Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speaker-specific training data, a 31% reduction in word error r...
A Phone-based Approach to Non-Linguistic Speech Feature Identification
- Computer Speech and Language
, 1995
"... In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value ass ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value associated with the model set having the highest likelihood. This technique is shown to be effective for text-independent gender, speaker, and language identification. Text-independent speaker identification accuracies of 98.8% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. Experiments in which speaker-specific models were estimated without using of the phonetic transcriptions for the TIMIT speakers had the same identification accuracies obtained with the use of the transcriptions. French/English language identification is better than 99% with 2s of read, laboratory speech. On spontaneous teleph...
SPEECH RECOGNITION IN SRI'S RESOURCE MANAGEMENT and ATIS Systems
"... This paper describes improvements to DECIPHER, the speech recog-nition component in SKI's Air Travel Information Systems (ATIS) and Resource Management systems. DECIPHER is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology. We show significant p ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper describes improvements to DECIPHER, the speech recog-nition component in SKI's Air Travel Information Systems (ATIS) and Resource Management systems. DECIPHER is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology. We show significant performance improvements in DECIPHER due to (I) the addition of tied-mixture I-IMM modeling (2) rejection of out-of-vocabulary speech and background noise while continuing to recognize speech (3) adapting to the current speaker (4) the implementation of N-gram statistical grammars with DECIPHER. Finally we describe our performance in the February 1991 DARPA Resource Management evaluation (4.8 percent word error) and in the February 1991 DARPA-ATIS speech and SLS evaluations (95 sentences correct, 15 wrong of 140). We show that, for the ATIS evaluation, a well-conceived system integration can be relatively robust to speech recognition errors and to linguistic variability and errors.
Analysis of LPC/DFT Features for an HMM-based Alphadigit Recognizer
- in the Signal Processing Letters
, 1995
"... The search for better and more robust performance of speech recognition systems is ongoing. Much of the improvement is likely to come from better acoustic feature analysis. In this letter, the results from a significant experiment are reported; these show how a warped-DFT analysis outperforms an LPC ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The search for better and more robust performance of speech recognition systems is ongoing. Much of the improvement is likely to come from better acoustic feature analysis. In this letter, the results from a significant experiment are reported; these show how a warped-DFT analysis outperforms an LPC-cepstral analysis in a significant way, supporting results by other researchers for different recognition tasks. An analysis of nasal-letter performance is used to show the development of the warped-DFT feature analysis. Keywords--- Cepstral Features, ANN, HMM. I. Introduction Different types of hidden Markov model (HMM)-based algorithms have been used successfully in speech recognition systems along with artificial neural networks (ANN), dynamic time warping (DTW) and template matching (TM) algorithms. In all these systems, the properties of the feature set play a very crucial role. In this letter, an HMM-based explicit-duration, talker-independent, connected-alphadigit recognizer is use...
Factorization Of Language Constraints In Speech Recognition
, 1991
"... Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizer ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizers. The recognized string is then decoded by means of an efficient language post-processor in which the full set of constraints is imposed to correct possible errors introduced by the speech recognizer.
Reducing errors by increasing the error rate: MLP Acoustic Modeling for Broadcast News Transcription
- DARPA Broadcast News Workshop
, 1999
"... We describe some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling. These include the use of novel `modulation spectrogram' features which are combined with conventional models at the posterior probability level, some experiments with nonlinear segment normaliz ..."
Abstract
- Add to MetaCart
We describe some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling. These include the use of novel `modulation spectrogram' features which are combined with conventional models at the posterior probability level, some experiments with nonlinear segment normalization, and an investigation of the interaction of model size and training set size for an multilayer perceptron (MLP) acoustic classifier. We also report preliminary results of incorporating gender-dependence into this system. 1. Background In recent years, we and our colleagues have promoted the exploration of novel, poorly understood, but promising approaches to speech recognition [2]. While such deviations from incremental improvements might initially hurt performance, the subset of the new methods that would ultimately prove useful would not be found without such explorations. This past year, we attempted to follow this advice, while still developing a system with reasonable performanc...

