## The Generation And Use Of Regression Class Trees For Mllr Adaptation (1996)

Citations: | 64 - 8 self |

### BibTeX

@TECHREPORT{Gales96thegeneration,

author = {M.J F. Gales},

title = {The Generation And Use Of Regression Class Trees For Mllr Adaptation},

institution = {},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

Maximum likelihood linear regression (MLLR) is an adaptation technique suitable for both speaker and environmental model-based adaptation. The models are adapted using a set of linear transformations, estimated in a maximum likelihood fashion from the available adaptation data. As these transformations can capture general relationships between the original model set and the current speaker, or new acoustic environment, they can be effective in adapting all the HMM distributions with limited adaptation data. Two important decisions that must be made are (i) how to cluster components together, such that they all have a similar transformation matrix, and (ii) how many transformation matrices to generate for a given block of adaptation data. This paper addresses both problems. Firstly it describes two optimal clustering techniques, in the sense of maximising the likelihood of the adaptation data. The first assigns each component to one of the regression classes. This may be used to generat...

### Citations

8932 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...s of the transformations are obtained when "hard", standard, clustering schemes are used in the regression tree. In order to solve this maximisation problem an Expectation-Maximisation (EM) =-=technique [2]-=- is used. The standard auxiliary function Q(M; M) is adopted, Q(M; M) = (17) K 1 \Gamma 1 2 L(O T jM) M X m=1 T X =1 Lm ( ) \Theta Km + log(j\Sigma m j) + (o( ) \Gammasm ) T \Sigma \Gamma1 m (o( ) \Ga... |

201 | Tree-based state tying for high accuracy acoustic modeling
- Young, Odell, et al.
- 1994
(Show Context)
Citation Context ...consisted of 36493 sentences from the SI-284 WSJ0 and WSJ1 sets, and the LIMSI 1993 WSJ lexicon and phone set were used. The standard HTK system was trained using decision-tree-based state clustering =-=[15]-=- to define 6399 speech states. A 12 component mixture Gaussian distribution was then trained for each tied state, a total of about 6 million parameters. For the H1 task a 65k word list and dictionary ... |

180 |
Hidden Markov model decomposition of speech and noise
- Varga, Moore
- 1990
(Show Context)
Citation Context ...small amount of speaker-specific or environment-specific adaptation data. Some environmental adaptation techniques require no speech data in the new acoustic environment to adapt the model parameters =-=[3, 12]-=-, only noise samples. However these schemes make assumptions about the form of the acoustic environment. Other techniques can only update distributions for which observations occur in the adaptation d... |

90 |
Model-Based Techniques for Noise Robust Speech Recognition
- Gales
- 1995
(Show Context)
Citation Context ...small amount of speaker-specific or environment-specific adaptation data. Some environmental adaptation techniques require no speech data in the new acoustic environment to adapt the model parameters =-=[3, 12]-=-, only noise samples. However these schemes make assumptions about the form of the acoustic environment. Other techniques can only update distributions for which observations occur in the adaptation d... |

66 |
Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains
- Gauvain
- 1994
(Show Context)
Citation Context ...out the form of the acoustic environment. Other techniques can only update distributions for which observations occur in the adaptation data, such as those using maximum a-posteriori (MAP) estimation =-=[5, 6]-=-. These require a relatively large amount of adaptation data to be effective. Another approach is to estimate a set of transformations that can be applied to the model parameters. If these transformat... |

65 | The development of the 1994 HTK large vocabulary speech recognition system
- Woodland
- 1994
(Show Context)
Citation Context ...ed for the recognition task was a gender-independent cross-word-triphone mixture-Gaussian tied-state HMM system. This was the same as the "HMM-1" model set used in the HTK 1994 ARPA evaluati=-=on system [13]-=-. The speech was parameterised into 12 MFCCs, C 1 to C 12 , along with normalised log-energy and the first and second differentials of these parameters. This yielded a 39-dimensional feature vector. T... |

53 | A one pass decoder design for large vocabulary recognition
- Odel, Valtchev, et al.
- 1994
(Show Context)
Citation Context ...ed state, a total of about 6 million parameters. For the H1 task a 65k word list and dictionary was used with the trigram language model described in [13]. All decoding used a dynamic-network decoder =-=[11]-=- which can either operate in a single-pass or rescore pre-computed word lattices. For the secondary channel experiments a PLP version of the standard MFCC models were built using single-pass retrainin... |

46 | Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation
- Gales, Pye, et al.
- 1996
(Show Context)
Citation Context ...alculate equation 15 is the sum and sum squared of the observations for each subset for each component when calculated directly. This load can be reduced by storing statistics at the base class level =-=[4]-=-. The use of cross validation techniques greatly increases the computational and memory requirements for adaptation. As such it is most suited to static adaptation tasks, though may be used for increm... |

34 |
Improved Acoustic Modeling for HMMs using Linear Transform
- Leggetter
- 1995
(Show Context)
Citation Context ...ion matrix, which may be full, block diagonal, or diagonal. The aim is to find the transformation W that maximises the likelihood of the adaptation data. This optimisation was originally described in =-=[7]-=- and is described in appendix A. 2.2 Regression classes As previously described all components associated with a particular regression class are assumed to transform in a similar fashion, ie W is the ... |

30 | Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression
- Woodland, Pye, et al.
- 1996
(Show Context)
Citation Context ..., thus I = . 10 5.2 Iterative MLLR An alternative scheme for selecting regression classes is to use iterative MLLR. Originally the technique was used to provide improved transcriptions for adaptation =-=[14]. Here it -=-is used to make the best use of the adaptation data and regression class tree. It relies on the fact that when transformations are not robustly estimated, they will "learn" the transcription... |

29 |
Flexible speaker adaptation for large vocabulary speech recognition
- Legetter, Woodland
- 1995
(Show Context)
Citation Context ...l set and the current speaker or new acoustic environment, they can be effective in adapting all the HMM distributions. One such transformation approach is maximum likelihood linear regression (MLLR) =-=[8, 9]-=- which estimates a set of linear transformations for the mean parameters of a mixture Gaussian HMM system, such that the likelihood of the adaptation data is maximised. As many components are assumed ... |

19 |
A study on speaker adaptation of continuous density HMM parameters
- Lin, Juang
- 1990
(Show Context)
Citation Context ...out the form of the acoustic environment. Other techniques can only update distributions for which observations occur in the adaptation data, such as those using maximum a-posteriori (MAP) estimation =-=[5, 6]-=-. These require a relatively large amount of adaptation data to be effective. Another approach is to estimate a set of transformations that can be applied to the model parameters. If these transformat... |

5 |
A Sankar, and V V Digalakis. A comparative study of speaker adaptation techniques
- Neumeyer
- 1995
(Show Context)
Citation Context ...on the fuzzy clustering then this is a globally optimal regression tree. 8 4. Transformation form. There are many forms that the transformations can take, for example diagonal, block diagonal or full =-=[10]-=-. The optimal regression tree will be dependent on the type of transform being considered. The more complex the transformation, the longer it takes to train the regression tree. In addition, more data... |

4 |
Leggetter and P CWoodland. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Computer Speech and Language
- J
- 1995
(Show Context)
Citation Context ...l set and the current speaker or new acoustic environment, they can be effective in adapting all the HMM distributions. One such transformation approach is maximum likelihood linear regression (MLLR) =-=[8, 9]-=- which estimates a set of linear transformations for the mean parameters of a mixture Gaussian HMM system, such that the likelihood of the adaptation data is maximised. As many components are assumed ... |