Recent advances in the automatic recognition of audio-visual speech (2003)
| Venue: | PROC. IEEE |
| Citations: | 64 - 10 self |
BibTeX
@INPROCEEDINGS{Potamianos03recentadvances,
author = {Gerasimos Potamianos and Chalapathy Neti and Guillaume Gravier and Ashutosh Garg and Andrew W. Senior},
title = {Recent advances in the automatic recognition of audio-visual speech},
booktitle = {PROC. IEEE},
year = {2003},
pages = {1306--1326},
publisher = {}
}
Years of Citing Articles
OpenURL
Abstract
Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design, based on a cascade of linear image transforms of an appropriate video region-of-interest, and subsequently, audio-visual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audio-visual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audio-visual adaptation. We apply our algorithms to three multi-subject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves automatic speech recognition over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.







