Results 1 -
2 of
2
Telephone Speech Recognition using Neural Networks and Hidden Markov Models
, 1999
"... The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, neural n ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, neural network based adaptation methods are applied to telephone speech recognition and a new unsupervised model adaptation method is proposed. The advantage of the neural network based approach is that the retraining of speech recognizers for telephone speech is avoided. Furthermore, because the multi-layer neural network is able to compute nonlinear functions, it can accommodate for the nonlinear mapping between full bandwidth speech and telephone speech. The new unsupervised model adaptation method does not require transcriptions and can be used with the neural networks. Experimental results on TIMIT/NTIMIT corpora show that the performance of the proposed methods is comparable to that of recogni...
Adaptation To Environment And Speaker Using Maximum Likelihood Neural Networks
"... When there is a mismatch between training and testing conditions, statistical speech recognition algorithms suffer from severe degradation in recognition accuracy. The mismatch could be due to the interference from acoustical environments where systems are actually used or from speakers themselves. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
When there is a mismatch between training and testing conditions, statistical speech recognition algorithms suffer from severe degradation in recognition accuracy. The mismatch could be due to the interference from acoustical environments where systems are actually used or from speakers themselves. In this paper, a neural network based transformation approach is studied to handle the data distribution mismatches between training and testing conditions. The conditional probability that comes from hidden Markov model (HMM) based recognizers is used for the objective function of a neural network. It maximizes the likelihood of the data from a testing environment, and allows global optimization of the network when used with HMM-based recognizers. The new objective function can be used to transform speech feature vectors, or the mean vectors and covariance matrices of a recognizer. The proposed algorithm is evaluated on a noisy distant-talking version of the Resource Management database.

