## Maximum-likelihood stochastic-transformation adaptation of hidden Markov models (1999)

Venue: | IEEE Trans. on Speech Audio Processing |

Citations: | 9 - 0 self |

### BibTeX

@ARTICLE{Diakoloukas99maximum-likelihoodstochastic-transformation,

author = {Vassilis D. Diakoloukas and Vassilios V. Digalakis},

title = {Maximum-likelihood stochastic-transformation adaptation of hidden Markov models},

journal = {IEEE Trans. on Speech Audio Processing},

year = {1999},

volume = {7},

pages = {177--187}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—The recognition accuracy in recent large vocabulary automatic speech recognition (ASR) systems is highly related to the existing mismatch between the training and testing sets. For example, dialect differences across the training and testing speakers result to a significant degradation in recognition performance. Some popular adaptation approaches improve the recognition performance of speech recognizers based on hidden Markov models with continuous mixture densities by using linear transformations to adapt the means, and possibly the covariances of the mixture Gaussians. The linear assumption, however, is too restrictive, and in this paper we propose a novel adaptation technique that adapts the means and, optionally, the covariances of the mixture Gaussians by using multiple stochastic transformations. We perform both speaker and dialect adaptation experiments, and we show that our method significantly improves the recognition accuracy and the robustness of our system. The experiments are carried out with SRI’s DECIPHER TM speech recognition system. Index Terms—Speaker adaptation, speech recognition, robust recognition. I.

### Citations

1212 |
An Introduction to Multivariate Statistical Analysis (2nd
- Anderson
- 1984
(Show Context)
Citation Context ...s partial derivative with respect to the , , since this term is not functionally dependent on the parameters. and A. Maximizing The maximization of the term is similar to the one described in [1]. In =-=[19]-=-, the joint-log-likelihood of a collection of samples drawn independently from a multivariate normal distribution with mean and covariance is described as trace where denotes the number of samples and... |

622 | Maximum likelihood linear regression for speaker adaptation of HMMs
- Leggetter, Woodland
- 1995
(Show Context)
Citation Context ...The transformation used at each time depends on the underlying HMM state , in which case the observation densities of the adapted models can be written (3) where denotes the transpose of a matrix. In =-=[2]-=-, the linear constraint is only applied to the means of the adapted observation densities, which become Closed-form solutions for the reestimation formulae of method (3) can be derived in the case of ... |

420 |
Maximun likelihood estimation from incomplete data using the EM algorithm (with discussion
- DEMPSTER, LAIRD, et al.
- 1977
(Show Context)
Citation Context ...ased on weight probabilities that are trained from the adaptation data. For the estimation of the transformation parameters and weight probabilities we use the expectation-maximization (EM) algorithm =-=[4]-=-. We evaluate our new method using SRI’s DECIPHER TM speech recognition system on dialect- and speaker-adaptation experiments, and we find that the new method significantly outperforms previous method... |

110 | A maximum likelihood approach to stochastic matching for robust speech recognition
- Sankar, Lee
- 1996
(Show Context)
Citation Context ...tation algorithms for large-vocabulary continuousdensity hidden Markov model (HMM) based speech recognizers have appeared that are based on constrained reestimation of the distribution parameters [1]–=-=[3]-=-. In these approaches, all the Gaussians in a single mixture, or a group of mixtures, if there is tying of transformations, are transformed using the same linear transformation. These transformation-b... |

48 | Speaker adaptation using combined transformation and Bayesian methods
- Digalakis, Neumeyer
- 1996
(Show Context)
Citation Context ...of continuous-density HMM’s appeared concurrently in [1] and [2], and became known as maximum-likelihood linear regression (MLLR). The transformation-based approach is combined with MAP adaptation in =-=[9]-=- and [10], and is extended to include biases in both the mel-cepstral and linear spectral domains in [11]. More related to the work presented here is the work of Gales [12], where he deals with the is... |

43 |
Probabilistic optimum filtering for robust speech recognition
- Newneyer, Weintraub
- 1994
(Show Context)
Citation Context ...ocus on transformation based adaptation. Transformation techniques for mismatch compensation and model adaptation can be used in both the feature and the model spaces. Probabilistic optimum filtering =-=[6]-=- is a technique applied in the feature space that requires parallel recordings of the clean and noisy speech. Maximum likelihood (ML) techniques can be used to avoid the need for parallel recordings. ... |

41 |
A comparative study of speaker adaptation techniques
- Neumeyer, Sankar, et al.
- 1995
(Show Context)
Citation Context ...tion formulae for (4) are simpler, can be used for full transformation matrices, but (1) (2) (4) do not adapt the covariances of the Gaussians. A comparative study of these two approaches was done in =-=[14]-=-. For method (3), where the affine constraint is applied to both the means and covariances of the mixture Gaussians, the following first- and second-order statistics for all multivariate normal densit... |

41 | Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers
- Digalakis, Monaco, et al.
- 1996
(Show Context)
Citation Context ...the speakers (half of them male) to serve as testing data and the rest composed the adaptation/training data with a total of 3814 sentences. Experiments were carried out using SRI’s DECIPHERTM system =-=[17]-=-. The system’s front-end was configured to output 12 cepstral coefficients, cepstral energy TABLE I NUMBER OF ADAPTATION PARAMETERS PER MIXTURE AND WORD ERROR RATES FOR LINEAR TRANSFORMATIONS (METHODS... |

38 |
Signal bias removal by maximum likelihood estimation for robust telephone speech recognition
- Rahin, Juang
(Show Context)
Citation Context ...ihood (ML) techniques can be used to avoid the need for parallel recordings. In the feature space, a simple transformation that consists of a single offset is estimated using ML estimation in [7] and =-=[8]-=-, and can be regarded as a generalization of cepstral mean normalization. Stochastic matching [3] treats the offset as a random bias, and utilizes multiple shifts. Multiple transformations are typical... |

35 |
Speaker adaptation using constrained reestimation of gaussian mixtures
- Digalakis, Rtischev, et al.
- 1995
(Show Context)
Citation Context ...adaptation algorithms for large-vocabulary continuousdensity hidden Markov model (HMM) based speech recognizers have appeared that are based on constrained reestimation of the distribution parameters =-=[1]-=-–[3]. In these approaches, all the Gaussians in a single mixture, or a group of mixtures, if there is tying of transformations, are transformed using the same linear transformation. These transformati... |

11 | Transformation Smoothing for Speaker and Environmental Adaptation
- Gales
- 1997
(Show Context)
Citation Context ...bined with MAP adaptation in [9] and [10], and is extended to include biases in both the mel-cepstral and linear spectral domains in [11]. More related to the work presented here is the work of Gales =-=[12]-=-, where he deals with the issue of optimal component assignment to a set of transformations, and the nonlinear model-space transformations presented in [13]. This paper is structured as follows. In Se... |

10 |
Acoustic adaptation using nonlinear transformations of HMM parameters
- Abrash, Sankar, et al.
- 1996
(Show Context)
Citation Context ...he work presented here is the work of Gales [12], where he deals with the issue of optimal component assignment to a set of transformations, and the nonlinear model-space transformations presented in =-=[13]-=-. This paper is structured as follows. In Section II, we review the linear transformation-based adaptation methods for continuous mixture-density HMM’s which were introduced in [1] and [2]. Section II... |

9 |
Improved Bayesian Learning of Hidden Markov Models for Speaker Adaptation
- Chien, Lee, et al.
- 1997
(Show Context)
Citation Context ...nuous-density HMM’s appeared concurrently in [1] and [2], and became known as maximum-likelihood linear regression (MLLR). The transformation-based approach is combined with MAP adaptation in [9] and =-=[10]-=-, and is extended to include biases in both the mel-cepstral and linear spectral domains in [11]. More related to the work presented here is the work of Gales [12], where he deals with the issue of op... |

9 | Development of DialectSpecific Speech Recognizers Using Adaptation Methods
- Diakoloukas, Digalakis, et al.
- 1997
(Show Context)
Citation Context ...del when tested on Stockholm speakers. On the other hand, its performance degraded significantly when tested on the Scanian-dialect testing set, reaching a word error rate of 25.08%. In previous work =-=[18]-=-, we adapted the Stockholm-dialect system using (3) with diagonal transformations (method I) and (4) with structured transformations (method II). The transformation matrices in method II are block dia... |

6 |
Training Issues and Channel Equalization Techniques for the Construction of Telephone Acoustic Models Using a High-Quality Speech Corpus
- Neumeyer, Digalakis, et al.
- 1994
(Show Context)
Citation Context ...um likelihood (ML) techniques can be used to avoid the need for parallel recordings. In the feature space, a simple transformation that consists of a single offset is estimated using ML estimation in =-=[7]-=- and [8], and can be regarded as a generalization of cepstral mean normalization. Stochastic matching [3] treats the offset as a random bias, and utilizes multiple shifts. Multiple transformations are... |

4 |
et al., “The Hub and Spoke Paradigm for CSR Evaluation
- Kubala, Cohen
- 1994
(Show Context)
Citation Context ...ch Spoken Language Translator project [15]. We have also evaluated our algorithm in speaker adaptation experiments based on the “spoke 3” task of the large-vocabulary Wall Street Journal (WSJ) corpus =-=[16]-=-. The goal of this task is to improve recognition performance for nonnative speakers of American English. A. Dialect Adaptation Experiments For our dialect adaptation experiments, we used data from th... |

2 |
A unified maximum likelihood approach to acoustic mismatch compensation: application to noisy Lombard speech recognition
- Afify, Gong
- 1997
(Show Context)
Citation Context ... linear regression (MLLR). The transformation-based approach is combined with MAP adaptation in [9] and [10], and is extended to include biases in both the mel-cepstral and linear spectral domains in =-=[11]-=-. More related to the work presented here is the work of Gales [12], where he deals with the issue of optimal component assignment to a set of transformations, and the nonlinear model-space transforma... |

2 |
et al. Spoken language translation with mid-90's technology: A case study
- Rayner
- 1993
(Show Context)
Citation Context ...cted by Telia, and the recognizer used in a bidirectional speech translation system between English and Swedish that has been developed under the SRI-Telia Research Spoken Language Translator project =-=[15]-=-. We have also evaluated our algorithm in speaker adaptation experiments based on the “spoke 3” task of the large-vocabulary Wall Street Journal (WSJ) corpus [16]. The goal of this task is to improve ... |

1 |
Statistical techniques for robust ASR: Review and perspectives
- Belegarda
- 1997
(Show Context)
Citation Context ..., and we find that the new method significantly outperforms previous methods based on the single linear transformation. A recent literature survey of statistical techniques for robust ASR appeared in =-=[5]-=-. In this paper we focus on transformation based adaptation. Transformation techniques for mismatch compensation and model adaptation can be used in both the feature and the model spaces. Probabilisti... |