Results 1 
8 of
8
LARGE CORPUS EXPERIMENTS FOR BROADCAST NEWS RECOGNITION
, 2003
"... This paper investigates the use of a large corpus for the training of a Broadcast News speech recognizer. A vast body of speech recognition algorithms and mathematical machinery is aimed at smoothing estimates toward accurate modeling with scant amounts of data. In most cases, this research is motiv ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper investigates the use of a large corpus for the training of a Broadcast News speech recognizer. A vast body of speech recognition algorithms and mathematical machinery is aimed at smoothing estimates toward accurate modeling with scant amounts of data. In most cases, this research is motivated by a real need for more data. In Broadcast News, however, a large corpus is already available to all LDC members. Until recently, it has not been considered for acoustic training. We would like to pioneer the use of the largest speech corpus (1200h) available for the purpose of acoustic training of speech recognition systems. To the best of our knowledge it is the largest scale acoustic training ever considered in speech recognition. We obtain a performance improvement of 1.5 % absolute WER over our best standard (200h) training.
CONSTRUCTION OF MODELSPACE CONSTRAINTS
"... HMM systems exhibit a large amount of redundancy. To this end, a technique called Eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA analysis of th ..."
Abstract
 Add to MetaCart
HMM systems exhibit a large amount of redundancy. To this end, a technique called Eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA analysis of the training speakers. In this paper, we show how PCA can be linked to the maximumlikelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7 % WER improvement (15 % relative) when using selfadaptation. 1. OPTIMAL ESTIMATION OF THE EIGENSPACE In this section, we show that the expected loglikelihood of the data is related to a sum of squared euclidean distances in the model space. This justifies using the SVD to compute the eigenspace. First, we will show that the loglikelihood of rows of MLLR matrices defines a quadratic form. Then, we define proper normalization to reduce the ML problem to a standard leastsquares problem, that can be solved by SVD. 1.1. Gaussianity of MLLR rows Speaker dependent models are needed to build the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLRadapted models [1]. MLLR transforms model ¦¨ § means by a matrix
LU FACTORIZATION FOR FEATURE TRANSFORMATION Patrick Nguyen
"... Linear feature space transformations are often used for speaker or environment adaptation. Usually, numerical methods are sought to obtain solutions. In this paper, we derive a closedform solution to ML estimation of full feature transformations. Closedform solutions are desirable because the prob ..."
Abstract
 Add to MetaCart
Linear feature space transformations are often used for speaker or environment adaptation. Usually, numerical methods are sought to obtain solutions. In this paper, we derive a closedform solution to ML estimation of full feature transformations. Closedform solutions are desirable because the problem is quadratic and thus blind numerical analysis may converge to poor local optima. We decompose the transformation into upper and lower triangular matrices, which are estimated alternatively using the EM algorithm. Furthermore, we extend the theory to Bayesian adaptation. On the Switchboard task, we obtain 1.6 % WER improvement by combining the method with MLLR, or 4 % absolute using adaptation. 1.
9 PIECEWISE LINEAR CONSTRAINTS FOR MODEL SPACE ADAPTATION Patrick
"... Setting linear constraints on HMM model space appears to be very effective for speaker adaptation. In doing so, we assume that model parameters are jointly Gaussian. While this approach has proven reasonably successful, we question it accuracy in the case of very high dimensionality parameter spaces ..."
Abstract
 Add to MetaCart
Setting linear constraints on HMM model space appears to be very effective for speaker adaptation. In doing so, we assume that model parameters are jointly Gaussian. While this approach has proven reasonably successful, we question it accuracy in the case of very high dimensionality parameter spaces. To address this problem, we employ a hierarchical piecewise linear model. Gross speaker variations are modeled with a linear eigenspace, subsuming the joint Gaussian model, and finer residues are modeled using another eigenspace chosen depending on the location of the first values. We perform experiments on Wall Street Journal (WSJ) dictation task, and we observe a cumulative 1.3 % WER improvement (11 % relative) when using selfadaptation. 1. EIGENVOICES WITH MLLR MODELS Using the eigenvoices approach in combination with MLLR is not a new idea. In this section, we will briefly introduce the notation and fundamental equations used in the next sections. 1.1. Gaussianity of MLLR rows Speaker dependent models are needed to build the eigenspace. However, for large vocabulary applications, building these models is difficult because of data sparsity and memory requirements. In practice, most systems use MLLRadapted models [1]. MLLR transforms model means by a matrix: We are concerned with the adaptation of mean vectors, with diagonal covariance matrices. The expected loglikelihood after Estep of the BaumWelch algorithm is 4
unknown title
"... In this paper, we summarize systems submitted by PSTL to the evaluation. We ran MetaData (MD) on Switchboard (SWB) and Broadcast News (BN) data. Speechtotext systems were built and tested on both SWB and BN systems with limited realtime constraints. For our first participation, our systems were ..."
Abstract
 Add to MetaCart
In this paper, we summarize systems submitted by PSTL to the evaluation. We ran MetaData (MD) on Switchboard (SWB) and Broadcast News (BN) data. Speechtotext systems were built and tested on both SWB and BN systems with limited realtime constraints. For our first participation, our systems were characterized by low complexity, exploratory operating conditions, and small resources. MD systems served as the segmentation / clustering stage of STT recognizers. Recognizers performed trigram Viterbi decoding with wordinternal triphonemodels. MLLR adaptation followed optionally. STT results underlined a relatively favourable benchmark on BN and inauspicious SWB evaluation. 1.
SELFADAPTATION USING EIGENVOICES FOR LARGEVOCABULARY CONTINUOUS SPEECH RECOGNITION Patrick Nguyen
"... In this paper, we present the application of eigenvoices to selfadaptation. This adaptation algorithm happens to be rather wellsuited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed ..."
Abstract
 Add to MetaCart
In this paper, we present the application of eigenvoices to selfadaptation. This adaptation algorithm happens to be rather wellsuited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed to be rather more tolerant of errorful recognition. A third property is the explicit aim to reduce the dimensionality that translates into compact computation of the likelihood. This can be exploited as an embedded confidence measure to minimize the impact of errors in the transcription. Our experiments were carried out on the Wall Street Journal evaluation task (WSJ). We reduced our word error rate (WER) by one percent absolute to 9.7%. 1.
Departamento de Sistemas Informáticos y Computación
"... In this paper we propose a family of Viterbi algorithms specialized for lexical tree based FSA and HMM acoustic models. Two algorithms to decode a tree lexicon with lefttoright models with or without skips and other algorithm which takes a directed acyclic graph as input and performs error correct ..."
Abstract
 Add to MetaCart
In this paper we propose a family of Viterbi algorithms specialized for lexical tree based FSA and HMM acoustic models. Two algorithms to decode a tree lexicon with lefttoright models with or without skips and other algorithm which takes a directed acyclic graph as input and performs error correcting decoding are presented. They store the set of active states topologically sorted in contiguous memory queues. The number of basic operations needed to update each hypothesis is reduced and also more locality in memory is obtained reducing the expected number of cache misses and achieving a speedup over other implementations. 1.
ISCA Archive SELFADAPTATION USING EIGENVOICES FOR LARGEVOCABULARY CONTINUOUS SPEECH RECOGNITION
"... In this paper, we present the application of eigenvoices to selfadaptation. This adaptation algorithm happens to be rather wellsuited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed ..."
Abstract
 Add to MetaCart
In this paper, we present the application of eigenvoices to selfadaptation. This adaptation algorithm happens to be rather wellsuited for such a task. First, it is an extremely fast adaptation algorithm, and thus well tailored to work for very short amounts of adaptation data. It is also believed to be rather more tolerant of errorful recognition. A third property is the explicit aim to reduce the dimensionality that translates into compact computation of the likelihood. This can be exploited as an embedded confidence measure to minimize the impact of errors in the transcription. Our experiments were carried out on the Wall Street Journal evaluation task (WSJ). We reduced our word error rate (WER) by one percent absolute to 9.7%. 1.