Results 1 -
5 of
5
CONSTRUCTION OF MODEL-SPACE CONSTRAINTS
"... HMM systems exhibit a large amount of redundancy. To this end, a technique called Eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA analysis of th ..."
Abstract
- Add to MetaCart
HMM systems exhibit a large amount of redundancy. To this end, a technique called Eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA analysis of the training speakers. In this paper, we show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7 % WER improvement (15 % relative) when using selfadaptation. 1. OPTIMAL ESTIMATION OF THE EIGENSPACE In this section, we show that the expected log-likelihood of the data is related to a sum of squared euclidean distances in the model space. This justifies using the SVD to compute the eigenspace. First, we will show that the log-likelihood of rows of MLLR matrices defines a quadratic form. Then, we define proper normalization to reduce the ML problem to a standard least-squares problem, that can be solved by SVD. 1.1. Gaussianity of MLLR rows Speaker dependent models are needed to build the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLR-adapted models [1]. MLLR transforms model ¦¨ § means by a matrix
LU FACTORIZATION FOR FEATURE TRANSFORMATION Patrick Nguyen
"... Linear feature space transformations are often used for speaker or environment adaptation. Usually, numerical methods are sought to obtain solutions. In this paper, we derive a closed-form solution to ML estimation of full feature transformations. Closed-form solutions are desirable because the prob ..."
Abstract
- Add to MetaCart
Linear feature space transformations are often used for speaker or environment adaptation. Usually, numerical methods are sought to obtain solutions. In this paper, we derive a closed-form solution to ML estimation of full feature transformations. Closed-form solutions are desirable because the problem is quadratic and thus blind numerical analysis may converge to poor local optima. We decompose the transformation into upper and lower triangular matrices, which are estimated alternatively using the EM algorithm. Furthermore, we extend the theory to Bayesian adaptation. On the Switchboard task, we obtain 1.6 % WER improvement by combining the method with MLLR, or 4 % absolute using adaptation. 1.
9 PIECEWISE LINEAR CONSTRAINTS FOR MODEL SPACE ADAPTATION Patrick
"... Setting linear constraints on HMM model space appears to be very effective for speaker adaptation. In doing so, we assume that model parameters are jointly Gaussian. While this approach has proven reasonably successful, we question it accuracy in the case of very high dimensionality parameter spaces ..."
Abstract
- Add to MetaCart
Setting linear constraints on HMM model space appears to be very effective for speaker adaptation. In doing so, we assume that model parameters are jointly Gaussian. While this approach has proven reasonably successful, we question it accuracy in the case of very high dimensionality parameter spaces. To address this problem, we employ a hierarchical piecewise linear model. Gross speaker variations are modeled with a linear eigenspace, subsuming the joint Gaussian model, and finer residues are modeled using another eigenspace chosen depending on the location of the first values. We perform experiments on Wall Street Journal (WSJ) dictation task, and we observe a cumulative 1.3 % WER improvement (11 % relative) when using self-adaptation. 1. EIGENVOICES WITH MLLR MODELS Using the eigenvoices approach in combination with MLLR is not a new idea. In this section, we will briefly introduce the notation and fundamental equations used in the next sections. 1.1. Gaussianity of MLLR rows Speaker dependent models are needed to build the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLR-adapted models [1]. MLLR transforms model means by a matrix: We are concerned with the adaptation of mean vectors, with diagonal covariance matrices. The expected loglikelihood after E-step of the Baum-Welch algorithm is 4
unknown title
"... In this paper, we summarize systems submitted by PSTL to the evaluation. We ran Meta-Data (MD) on Switchboard (SWB) and Broadcast News (BN) data. Speech-to-text systems were built and tested on both SWB and BN systems with limited real-time constraints. For our first participation, our systems were ..."
Abstract
- Add to MetaCart
In this paper, we summarize systems submitted by PSTL to the evaluation. We ran Meta-Data (MD) on Switchboard (SWB) and Broadcast News (BN) data. Speech-to-text systems were built and tested on both SWB and BN systems with limited real-time constraints. For our first participation, our systems were characterized by low complexity, exploratory operating conditions, and small resources. MD systems served as the segmentation / clustering stage of STT recognizers. Recognizers performed trigram Viterbi decoding with word-internal triphone-models. MLLR adaptation followed optionally. STT results underlined a relatively favourable benchmark on BN and inauspicious SWB evaluation. 1.
SELF-ADAPTATION USING EIGENVOICES FOR LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION Patrick Nguyen
"... In this paper, we present the application of eigenvoices to self-adaptation. This adaptation algorithm happens to be rather well-suited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed ..."
Abstract
- Add to MetaCart
In this paper, we present the application of eigenvoices to self-adaptation. This adaptation algorithm happens to be rather well-suited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed to be rather more tolerant of errorful recognition. A third property is the explicit aim to reduce the dimensionality that translates into compact computation of the likelihood. This can be exploited as an embedded confidence measure to minimize the impact of errors in the transcription. Our experiments were carried out on the Wall Street Journal evaluation task (WSJ). We reduced our word error rate (WER) by one percent absolute to 9.7%. 1.

