Results 1 -
2 of
2
Audio-visual automatic speech recognition: An overview
- Issues in Visual and Audio-visual Speech Processing
, 2004
"... We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly per ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly pervasive user interface. Indeed, even in “clean ” acoustic environments, and for a variety of tasks, state of the art ASR system
USING LIKELIHOOD L-STATISTIC AS CONFIDENCE MEASURE IN AUDIO-VISUAL SPEECH RECOGNITION
"... This paper describes recent work on decision fusion in audio-visual speech recognition. In this work, a novel approach is proposed to combine audio and video channels information in audio-visual speech recognition scenario. For simplicity, we have only considered frame-level phonetic classification ..."
Abstract
- Add to MetaCart
This paper describes recent work on decision fusion in audio-visual speech recognition. In this work, a novel approach is proposed to combine audio and video channels information in audio-visual speech recognition scenario. For simplicity, we have only considered frame-level phonetic classification problem using two singlestream Gaussian Mixture Model (GMM). Audio and video streams are adaptively weighted using a cumulative mean of the sample confidence values over past frames in addition to the present sample confidence value. The confidence values for audio and video decisions are computed using an L-statistic (linear combination of order-statistic) of the log-likelihoods against phone models. It is shown through various experiments, on a database of about 15000 sentences from large vocabulary continuous speech, that the proposed approach results in better classification accuracy as compared to other approaches. 1.

