### Table 2. Word error rates for full covariance models with state dependent quadratic feature space transforms.

"... In PAGE 4: ... Further exploring the use of a quadratic feature space transform we considered using a different transform for each HMM state. Table2 shows the results with 680 and 10K gaussians respectively. In the case of 680 gaussians, where each gaussian has its own quadratic feature transform qj(Ajx), there was a substantial gain over the baseline full covariance model with 680 gaussians.... ..."

### Table 2. Word error rates for full covariance models with state dependent quadratic feature space transforms.

"... In PAGE 4: ... Further exploring the use of a quadratic feature space transform we considered using a different transform for each HMM state. Table2 shows the results with 680 and 10K gaussians respectively. In the case of 680 gaussians, where each gaussian has its own quadratic feature transform D5CYB4BTCYDCB5, there was a substantial gain over the baseline full covariance model with 680 gaussians.... ..."

### Table 1: Error rates on swb98 for diagonal models in the CBBTCCBC feature space built with various BIC penalty weights.

2003

"... In PAGE 3: ... In fact, we varied the penalty weight and generated models of various sizes and tested all of these models on the swb98 test data in the CBBTCCBC feature space. The results are presented in Table1 . As can be seen, the model which performed best had BFBHBJC3 Gaus- sians and an error rate of BFBEBMBGB1, with larger models becoming overtrained.... In PAGE 3: ...Table 1: Error rates on swb98 for diagonal models in the CBBTCCBC feature space built with various BIC penalty weights. In order to explore the benefits of using full covariance modeling and to have a model to be used for training the SPAM basis CUCBCZCV for various BW, we next created a full covariance model with BIBDC3 Gaussians in the CBBTCCBC feature space, seeded by the diagonal model in the last line of Table1 . (The choice of BIBDC3 Gaussians, rather than a larger model was made be- cause of implementation constraints.... ..."

Cited by 7

### Table 2: Word Error Rate (percent) with Feature Space Bias Estimation

1996

"... In PAGE 21: ... It is seen from Figure 5 that the EM algorithm converges within two iterations. Table2 gives the percentage word error rates for speaker A and B under mismatched conditions after processing with two sets of feature space bias estimation approaches: 1) a single bias vector is estimated for the entire utterance (FS1), and 2) a separate vector is estimated for speech and silence frames (FS2). For reference, we also reproduce from Table 1 the mismatched performance (MIS), and the matched performance (MAT).... In PAGE 23: ...Table 3: Word Error Rate (percent) with Feature and Model Space Bias Estimation In Table 3, we reproduce the results of Table2 , and also give the results for two sets of model space bias estimation procedures: 1) a single mean and variance vector is estimated for the entire utterance (MS1), and 2) a separate mean and variance vector is estimated for speech and silence frames (MS2). As in the feature space results (Table 2), the results of Table 3 show that, in the model space too, estimating separate speech and silence bias parameters (MS2) improves the performance for both speakers in the telephone speech (TEL), when compared to estimating one set of bias parameters (MS1).... In PAGE 23: ...Table 3: Word Error Rate (percent) with Feature and Model Space Bias Estimation In Table 3, we reproduce the results of Table 2, and also give the results for two sets of model space bias estimation procedures: 1) a single mean and variance vector is estimated for the entire utterance (MS1), and 2) a separate mean and variance vector is estimated for speech and silence frames (MS2). As in the feature space results ( Table2 ), the results of Table 3 show that, in the model space too, estimating separate speech and silence bias parameters (MS2) improves the performance for both speakers in the telephone speech (TEL), when compared to estimating one set of bias parameters (MS1). Again, for the microphone speech (MIC), separate speech and silence bias parameter estimates did not result in additional im- provement.... ..."

Cited by 80

### Table 2. IHM improvements over the system developed in [1] on lectDEV. First pass with incremental VTLN and feature-space constrained MLLR (FSA) estimation and a frame shift of 10 ms, second pass with static VTLN, FSA and MLLR and 8 ms frame shift.

"... In PAGE 6: ...ere applied to the CHIL training data with a weight of 4.0. Comparing the resulting system to the system used in [1], we improved our second pass result by 1.4% absolute (see Table2 , second row). For the conference meeting system, we used exactly the same acoustic models, except for one difference: the PRONLEX system was additionally adapted using MAP with a weight of 0.... ..."

### Table 2. Context feature space (Schmidt 2002).

"... In PAGE 32: ... In the work model for context by Schmidt (Schmidt 2002) there is a set of relevant features for each context, and for each relevant feature a range of determined values. The context feature space is assumed to be categorical and hierarchically organised as presented in Table2 . There are two main categories with three subcategories each.... In PAGE 100: ...f the repetitions are classed to a particular cluster. Table 8 shows the results. It is shown that scenarios, 1, 2 are quite similar with little similarity between scenarios 3, 4, and 5. This is because scenarios 1, 2 are actually quite similar and other scenarios differ from each other ( Table2 , Publication V). Table 8.... ..."

### Table 1: Error rates on swb98 for diagonal models in the a20 a43 a18a4a6a22 feature space built with various BIC penalty weights.

2003

"... In PAGE 3: ... In fact, we varied the penalty weight and gen- erated models of various sizes and tested all of these models on the swb98 test data in the a20 a43 a18a4 a22 feature space. The results are presented in Table1 . As can be seen, the model which per- formed best had a11a26a25a9a15a9a14 Gaussians and an error rate of a11 a22 a32 a27a26a28 , with larger models becoming overtrained.... In PAGE 3: ...Table 1: Error rates on swb98 for diagonal models in the a20 a43 a18a4a6a22 feature space built with various BIC penalty weights. In order to explore the benefits of using full covariance modeling and to have a model to be used for training the SPAM basis a0 a20 a28 a1 for various a14 , we next created a full covariance model with a7 a103 a12a14 Gaussians in the a20 a43 a18a4 a22 feature space, seeded by the diagonal model in the last line of Table1 . (The choice of a7 a103 a31a14 Gaussians, rather than a larger model was made be- cause of implementation constraints.... ..."

Cited by 7

### Table 1 compares EMLLT models with ML trained bases to those obtained by stacking MLLT matrices for various phone classes. The latter method is restricted to generating basis sizes D that are multiples of the feature space size d. Preliminary experi- ments on the method used to generate the phone classes (manual vs data-driven) showed no signi cant differences in performance. We see that ML training of bases essentially reduces the number of parameters required to model the covariance by half. The last row in the table is the performance of a full covariance model. The rst row is an MLLT model so both methods of training are the same.

2003

"... In PAGE 4: ... Table1 . WER comparison of EMLLT bases obtained by stacking to ML trained ones for various basis sizes The next table shows the exibility (we can now have D lt; d) and bene ts of using an af ne subspace in the rank-one case.... In PAGE 4: ... WER comparison of EMLLT bases obtained by stacking to ML trained ones for various basis sizes The next table shows the exibility (we can now have D lt; d) and bene ts of using an af ne subspace in the rank-one case. With D = d the standard MLLT system in Table1 is about 10% worse than the D = d af ne rank-one subspace system in Table 2. Using an af ne subspace incurs no extra cost in the evaluation of Gaussians.... ..."

Cited by 9

### Table 1. Comparison of PCA vs. ML trained bases for mean and feature-space speaker adaptation. At test time the adaptation parameters were estimated with the MAP objective under a single Gaussian prior.

"... In PAGE 3: ... Mean and covariance of d(s) (for training speakers) formed the mean and covariance of the single Gaussian prior used in MAP estimation of d(s) [4] at test time. Table1 shows word error rate comparisons for adapta- tion in PCA vs. ML trained basis for various amounts of adaptation data.... ..."

### Table 1. Comparison of PCA vs. ML trained bases for mean and feature-space speaker adaptation. At test time the adaptation parameters were estimated with the MAP objective under a single Gaussian prior.

"... In PAGE 3: ... Mean and covariance of CSB4D7B5 (for training speakers) formed the mean and covariance of the single Gaussian prior used in MAP estimation of CSB4D7B5 [4] at test time. Table1 shows word error rate comparisons for adapta- tion in PCA vs. ML trained basis for various amounts of adaptation data.... ..."