Results 1 -
2 of
2
Incremental Language Models For Speech Recognition Using Finite-State Transducers
- in Proc. IEEE Automatic Speech Recogntion and Understanding Workshop, Madonna di Campiglio
, 2001
"... to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is usefu ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finitestate transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these `incremental language models' and present some initial results.
Automatic Transcription Of English Broadcast News
- Proc. of the DARPA Broadcast News Transcription and Understanding Workshop
, 1998
"... In this paper the Philips Broadcast News transcription system is described. The Broadcast News task aims at the recognition of "found" speech in radio and television broadcasts without any additional side information (e.g. speaking style, background conditions). The system was derived from the Phili ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper the Philips Broadcast News transcription system is described. The Broadcast News task aims at the recognition of "found" speech in radio and television broadcasts without any additional side information (e.g. speaking style, background conditions). The system was derived from the Philips continuous mixture density crossword HMM system, using MFCC features and Laplacian densities. A segmentation was performed to obtain sentence-like partitions of the broadcasts. Using data-driven clustering, the obtained segments were grouped into clusters with similar acoustic conditions for adaptation purposes. Gender independent wordinternal and crossword triphone models were trained on 70 hours of the HUB4 training data. No focus condition specific training was applied. Channel and speaker normalization was done by mean and variance normalization as well as VTN and MLLR. The transcription was produced by an adaptive multiple pass decoder starting with phrase-bigram decoding using word-...

