Results 1 -
5 of
5
Bayesian adaptive inference and adaptive training
- IEEE Transactions Speech and Audio Processing
, 2007
"... Abstract—Large-vocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract—Large-vocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build systems on such data. Here, transforms are used to represent the different acoustic conditions, and then a canonical model is trained given this set of transforms. This paper describes a Bayesian framework for adaptive training and inference. This framework addresses some limitations of standard maximum-likelihood approaches. In contrast to the standard approach, the adaptively trained system can be directly used in unsupervised inference, rather than having to rely on initial hypotheses being present. In addition, for limited adaptation data, robust recognition performance can be obtained. The limited data problem often occurs in testing as there is no control over the amount of the adaptation data available. In contrast, for adaptive training, it is possible to control the system complexity to reflect the available data. Thus, the standard point estimates may be used. As the integral associated with Bayesian adaptive inference is intractable, various marginalization approximations are described, including a variational Bayes approximation. Both batch and incremental modes of adaptive inference are discussed. These approaches are applied to adaptive training of maximum-likelihood linear regression and evaluated on a large-vocabulary speech recognition task. Bayesian adaptive inference is shown to significantly outperform standard approaches. Index Terms—Adaptive training, Bayesian adaptation, Bayesian inference, incremental, variational Bayes.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various non-speech variabil-ities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multi-style training, for exam-ple speaker-independent training. Effectively, the non-speech variabilities are ignored. Though good performance has been obtained with multi-style systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multi-style training, a set of transforms is used to represent the non-speech variabilities. A canonical
Adaptive Training Using Structured Transforms
- in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process
, 2004
"... Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard adaptive training employs a single transform to represent unwanted acoustic variability for an utterance. A canonical model representing only the inherent speech variability may t ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard adaptive training employs a single transform to represent unwanted acoustic variability for an utterance. A canonical model representing only the inherent speech variability may then be trained given this set of transforms. For found data there are commonly multiple acoustic factors affecting the speech signal. This paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transform are considered, cluster mean interpolation and constrained MLLR. Re-estimation formulae for estimating the canonical model using both maximum likelihood and minimum phone error training are presented. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to significantly reduce the word error rate.
Incremental adaptation using Bayesian inference
- in Proc. ICASSP, 2006
"... Adaptive training is a powerful technique to build system on nonhomogeneous training data. Here, a canonical model, representing “pure ” speech variability and a set of transforms representing unwanted acoustic variabilities are both trained. To use the canonical model for recognition, a transform f ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Adaptive training is a powerful technique to build system on nonhomogeneous training data. Here, a canonical model, representing “pure ” speech variability and a set of transforms representing unwanted acoustic variabilities are both trained. To use the canonical model for recognition, a transform for the test acoustic condition is required. For some situations a robust estimate of the transform parameters may not be possible due to limited, or no, adaptation data. One solution to this problem is to view adaptive training in a Bayesian framework and marginalise out the transform parameters. Exact implementation of this Bayesian inference is intractable. Recently, lower bound approximations based on variational Bayes have been used to solve this problem for batch adaptation with limited data. This paper extends this Bayesian adaptation framework to incremental adaptation. Various lower-bound approximations and options for propagating information within this incremental framework are discussed. Experiments using adaptive models trained with both maximum likelihood and minimum phone error training are described. Using incremental Bayesian adaptation gains were obtained over the standard approaches, especially for limited data. 1.
BAYESIAN ADAPTATION AND ADAPTIVELY TRAINED SYSTEMS
"... As the use of found data increases, more systems are being built using adaptive training. Here transforms are used to represent unwanted acoustic variability, e.g. speaker and acoustic environment changes, allowing a canonical model that models only the “pure ” variability of speech to be trained. A ..."
Abstract
- Add to MetaCart
As the use of found data increases, more systems are being built using adaptive training. Here transforms are used to represent unwanted acoustic variability, e.g. speaker and acoustic environment changes, allowing a canonical model that models only the “pure ” variability of speech to be trained. Adaptive training may be described within a Bayesian framework. By using complexity control approaches to ensure robust parameter estimates, the standard point estimate adaptive training can be justified within this Bayesian framework. However during recognition there is usually no control over the amount of data available. It is therefore preferable to be able to use a full Bayesian approach to applying transforms during recognition rather than the standard point estimates. This paper discusses various approximations to Bayesian approaches including a new variational Bayes approximation. The application of these approaches to state-of-the-art adaptively trained systems using both CAT and MLLR transforms is then described and evaluated on a large vocabulary speech recognition task. 1.

