Results 1 -
3 of
3
Speaker Adaptation Using Combined Transformation and Bayesian Methods
, 1994
"... Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typical ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.
An Experimental Study Of Acoustic Adaptation Algorithms
- IEEE Int'l Conference on ASSP
, 1996
"... Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters in ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters include whether the adaptation is sbibperformed in the feature-space or model-space, and whether the adaptation is supervised or unsupervised. An additional parameter is the adaptation data. For example adaptation may be performed using an independent dataset or the test data itself. The latter is referred to as transcription-mode adaptation. In this paper, we experimentally study the effect of these various parameters, and report on our findings. 1. INTRODUCTION Recently, there has been much interest in the area of transformation-based ML adaptation to reduce the recognition degradation caused by acoustic mismatches between the training and testing conditions [1, 2, 3]. It is assumed that...
Noise-Resistant Feature Extraction And Model Training For Robust Speech Recognition
- in Proceedings of the 1996 DARPA CSR Workshop
, 1996
"... In this paper we report on our recent work on noise-robust feature extraction and model training to alleviate the mismatch caused by different microphones and ambient room noise in the context of the 1995 DARPA-sponsored H3 benchmark test, which used the unlimited-vocabulary North American Business ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we report on our recent work on noise-robust feature extraction and model training to alleviate the mismatch caused by different microphones and ambient room noise in the context of the 1995 DARPA-sponsored H3 benchmark test, which used the unlimited-vocabulary North American Business News (NABN) database. We present a novel noise-robust feature extraction algorithm that is a combination of our previously developed minimum mean square error (MMSE) log-energy estimation algorithm and the probabilistic optimum filtering (POF) algorithm. We also studied an approach based on training the automatic speech recognition (ASR) system with previously collected noisy speech. While both the above approaches gave significant improvements, it was found that combining them gave the best results. We also report on a new part-of-speech (POS) language model that makes it possible to train robust POS language models that incorporate longer contexts than is possible with word-based language ...

