Results 1 - 10
of
10
Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
- IEEE Transactions on Speech and Audio Processing
, 1995
"... A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. P ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both ...
Speaker Adaptation Using Combined Transformation and Bayesian Methods
, 1994
"... Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typical ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.
Lightly Supervised and Unsupervised Acoustic Model Training
- Computer Speech and Language
, 2002
"... The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%. ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%.
Language Modeling With Sentence-Level Mixtures
, 1994
"... Language models play an important role in improving the accuracy of a continuous speech recognizer. In this thesis, we introduce a new statistical language model which captures long term topic dependencies of words within and across sentences. The model includes two main contributions. First, we dev ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Language models play an important role in improving the accuracy of a continuous speech recognizer. In this thesis, we introduce a new statistical language model which captures long term topic dependencies of words within and across sentences. The model includes two main contributions. First, we develop a topic-dependent sentence-level mixture language model which takes advantage of the topic constraints in a sentence or a paragraph. Since this language model is not Markov and has a large search space, it is used only in the last stage of a multi-pass search strategy in the recognizer. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model. During the course of this thesis, we also investigate robust parameter estimation techniques, which are extremely important in light of the sparse data problems in language modeling. The model is implemented in the BU speech recognition system and provides a significant improvement in recognition accuracy. An important advantage of the framework of our model is that it is a simple extension of existing language modeling techniques that can easily be integrated with other language modeling advances.
Lattice-Based Search Strategies For Large Vocabulary Speech Recognition
, 1995
"... The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass search strategies have been used as a means of applying inexpensive knowledge sources early on to prune the search space for subsequent passes using more expensive knowledge sources. Three multi-pass search algorithms are investigated in this thesis work: the N-best search algorithm, a lattice dynamic programming search algorithm and a lattice local search algorithm. Both the lattice dynamic programming and lattice local search algorithms are shown to achieve comparable performance to the N-best search algorithm while running as much as 10 times faster on a 20,000 word vocabulary task. The lattice local search algorithm is also shown to have the additional advantage over the lattice dynamic programming search algorithm of allowing sentence-level knowledge sources to be incorporated into the search.
On-Line Adaptation Of Hidden Markov Models Using Incremental Estimation Algorithms
- IEEE Trans. Speech Audio Processing
, 1999
"... The mismatch that frequently occurs between the training and testing conditions of an automatic speech recognizer can be efficiently reduced by adapting the parameters of the recognizer to the testing conditions. The maximum likelihood adaptation algorithms for continuous -density hidden-Markov-mode ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The mismatch that frequently occurs between the training and testing conditions of an automatic speech recognizer can be efficiently reduced by adapting the parameters of the recognizer to the testing conditions. The maximum likelihood adaptation algorithms for continuous -density hidden-Markov-model (HMM) based speech recognizers are fast, in the sense that a small amount of data is required for adaptation. They are, however, based on reestimating the model parameters using the batch version of the expectation-maximization (EM) algorithm. The multiple iterations required for the EM algorithm to converge make these adaptation schemes computationally expensive and not suitable for on-line applications, since multiple passes through the adaptation data are required. In this paper we show how incremental versions of the EM and the segmental k-means algorithm can be used to improve the convergence of these adaptation methods so that they can be used in on-line applications. 1. INTRODUCTIO...
Towards Task-Independent Speech Recognition
- In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
, 2001
"... Despite the considerable progress made in the last decade, speech recognition is far from a solved problem. For instance, porting a recognition system to a new task (or language) still requires substantial investment of time and money, as well as expertise in speech recognition. This paper takes a f ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Despite the considerable progress made in the last decade, speech recognition is far from a solved problem. For instance, porting a recognition system to a new task (or language) still requires substantial investment of time and money, as well as expertise in speech recognition. This paper takes a first step at evaluating to what extent a generic state-of-the-art speech recognizer can reduce the manual effort required for system development.
Improving Genericity for Task-Independent Speech Recognition
"... Although there have been regular improvements in speech recognition technology over the past decade, speech recognition is far from being a solved problem. Recognition systems are usually tuned to a particular task and porting the system to a new task (or language) is both time-consuming and expensi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Although there have been regular improvements in speech recognition technology over the past decade, speech recognition is far from being a solved problem. Recognition systems are usually tuned to a particular task and porting the system to a new task (or language) is both time-consuming and expensive. In this paper, issues in speech recognizer portability are addressed through the development of generic core speech recognition technology. First, the genericity of wide domain models is assessed by evaluating performance on several tasks. Then, the use of transparent methods for adapting generic models to a specific task is explored. Finally, further techniques are evaluated aiming at enhancing the genericity of the wide domain models. We show that unsupervised acoustic model adaptation and multi-source training can reduce the performance gap between task-independent and taskdependent acoustic models, and for some tasks even out-perform task-dependent acoustic models.
Genericity and Adaptability Issues for Task-Independent Speech Recognition
- ISCA ITRW 2001 Adaptation Methods For Speech Recognition, Sophia-Antipolis
, 2001
"... The last decade has witnessed major advances in core speech recognition technology, with today's systems able to recognize continuous speech from many speakers without the need for an explicit enrollment procedure. Despite these improvements, speech recognition is far from being a solved problem. Mo ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The last decade has witnessed major advances in core speech recognition technology, with today's systems able to recognize continuous speech from many speakers without the need for an explicit enrollment procedure. Despite these improvements, speech recognition is far from being a solved problem. Most recognition systems are tuned to a particular task and porting the system to another task or language is both time-consuming and expensive.
Adaptation Of Hidden Markov Models Using Multiple Stochastic Transformations
- IEEE Trans. Speech and Audio Processing
, 1997
"... The recognition accuracy in recent large vocabulary Automatic Speech Recognition (ASR) systems is highly related to the existing mismatch between the training and test sets. For example, dialect differences across the training and testing speakers result to a significant degradation in recognition p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The recognition accuracy in recent large vocabulary Automatic Speech Recognition (ASR) systems is highly related to the existing mismatch between the training and test sets. For example, dialect differences across the training and testing speakers result to a significant degradation in recognition performance. Some popular adaptation approaches improve the recognition performance of speech recognizers based on hidden Markov models with continuous mixture densities by using linear transforms to adapt the means, and possibly the covariances of the mixture Gaussians. In this paper, we propose a novel adaptation technique that adapts the means and, optionally, the covariances of the mixture Gaussians by using multiple stochastic transformations. We perform both speaker and dialect adaptation experiments, and we show that our method significantly improves the recognition accuracy and the robustness of our system. The experiments are carried out with SRI's DECIPHER TM speech recognition sy...

