In this paper we report on the LIMSI Wall Street Journal system which was evaluated in the November 1993 test. The recognizer makes use of continuous density HMM with Gaussian mixture for acoustic modeling and n-gram statistics estimated on the newspaper texts for language modeling. The decoding is carried out in two forward acoustic passes. The first pass is a time-synchronous graphsearch, which is shown to still be viable with vocabularies of up to 20k words when used with bigram back-off language models. The second pass, which makes use of a word graph generated with the bigram, incorporates a trigram language model. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. The official Nov93 evaluation results are given for vocabularies of up to 64,000 words, as well as results on the Nov92 5k and 20k test material. 1. Introduction Our speech recognition research focuses on developing reco...
user correction - Legacy Corrections
In Proc. 1994 ARPA Spoken Language Technology Workshop