## CONSTRUCTION OF MODEL-SPACE CONSTRAINTS

### BibTeX

@MISC{Nguyen_constructionof,

author = {Patrick Nguyen and Luca Rigazio and Christian Wellekens and Jean-claude Junqua},

title = {CONSTRUCTION OF MODEL-SPACE CONSTRAINTS},

year = {}

}

### OpenURL

### Abstract

HMM systems exhibit a large amount of redundancy. To this end, a technique called Eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA analysis of the training speakers. In this paper, we show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7 % WER improvement (15 % relative) when using selfadaptation. 1. OPTIMAL ESTIMATION OF THE EIGENSPACE In this section, we show that the expected log-likelihood of the data is related to a sum of squared euclidean distances in the model space. This justifies using the SVD to compute the eigenspace. First, we will show that the log-likelihood of rows of MLLR matrices defines a quadratic form. Then, we define proper normalization to reduce the ML problem to a standard least-squares problem, that can be solved by SVD. 1.1. Gaussianity of MLLR rows Speaker dependent models are needed to build the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLR-adapted models [1]. MLLR transforms model ¦¨ § means by a matrix

### Citations

628 | Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- Leggetter, Woodland
- 1995
(Show Context)
Citation Context ... index refers to a Gaussian distribution. Without loss of generality, we only explore the case of a global transformation matrix. By hypothesis � � � § � � is a diagonal matrix with . The ML estimate =-=[2]-=- for the MLLR row � elements � precision � : � ��� � ��� � ��� � � ��� ��� � ���� §�� ���� � § §�������� §�������� ��� � � ����� § ��� § Rearranging the terms of eq(2) as in [3], we obtain: � ��� ����... |

102 | Rapid speaker adaptation in eigenvoice space
- Kuhn, Junqua, et al.
- 2000
(Show Context)
Citation Context ... � � � MLLR rows are � Gaussian with � mean and precision . 1.2. Eigenvoices with MLLR-adapted models To be effective in fast speaker adaptation, we choose to reduce the dimensionality of the problem =-=[4]-=-. We define the set of speaker transformation parameters by stacking all rows to form a supervector � : ��� �� ��� . � ��� ������� � The dimension of the supervector is . We postulate that speaker sup... |

61 | Cluster adaptive training of hidden Markov models
- Gales
- 2000
(Show Context)
Citation Context ...uild the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLR-adapted models =-=[1]-=-. MLLR transforms model ¦¨§ means by a matrix : ©���� ��������������������� � § ¦ ��©�� § � �� ���� . � � � � ��� ¦�§�� ����� The feature space has dimension � . Each row ��� has dimension ��� � . � (... |

22 | Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments
- Nguyen, Wellekens, et al.
- 1999
(Show Context)
Citation Context ...uccessive projections and meansquare estimation steps become apparent. In the root space, the inverse correlation may be computed offline. 1.6. Reestimation of the eigenspace As with CAT [1] and MLES =-=[6]-=-, we can reestimate the eigenspace in the Baum-Welch algorithm. If we reestimate the eigenspace the solution may not retain orthogonality of the eigenvectors. We embed the eigen decompositions of spea... |

8 | EWAVES: an efficient decoding algorithm for lexical tree based speech recognition
- Nguyen, Rigazio, et al.
(Show Context)
Citation Context ...agonal covariances, pooled in 1500 mixtures. The language model (LM) for this task is the standard trigram model provided by MIT. There are about 20k words for decoding. Our recognizer, called EWAVES =-=[8]-=-, is a lexical-tree based, gender-independent, word-internal context-dependent, onepass trigram Viterbi decoder with bigram LM lookahead. The systems runs at about 3 times real-time, with a search eff... |

7 |
Rapid speaker adaptation using a priori knowledge by Eigenspace analysis of MLLR parameters
- Wang, Lee, et al.
- 2001
(Show Context)
Citation Context ...es decomposition. For the case of the inverse space transformation, the solution can be obtained by direct differentiation of � in equation 2. This leads to an inefficient implementation. As noted in =-=[5]-=-, one can follow the Markov chain of sufficient statistics � � § � � � � §������ � § � � � ����� � � §������ � � � � � ��� � � (18) The inverse space and root space transformation have respectively: �... |

4 |
Using maximum likelihood linear regression for segment clustering and speaker identification
- Bacchian
(Show Context)
Citation Context ...with . The ML estimate [2] for the MLLR row � elements � precision � : � ��� � ��� � ��� � � ��� ��� � ���� §�� ���� � § §�������� §�������� ��� � � ����� § ��� § Rearranging the terms of eq(2) as in =-=[3]-=-, we obtain: � ��� ����� � � � �� � � � � � � � � § ����� � � � � � ��� ��� � � � (2) � has where � � completes the quadratic form. The sum is over all rows of the transformation matrix. Eq. (6) state... |