Results 1 - 10
of
13
Extraction of Visual Features for Lipreading
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is de ..."
Abstract
-
Cited by 101 (7 self)
- Add to MetaCart
(Show Context)
The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is degraded. This paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three meth-ods for parameterising lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape, or shape and appearance respectively. The third, bottom-up, method uses a non-linear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multi-talker visual speech recognition task of isolated letters.
Lipreading Using Shape, Shading And Scale
- IN AUDITORY-VISUAL SPEECH PROCESSING
, 1998
"... This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A nontracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA). This performs almost identically to AAM's in both visual and audio-visual recognition tasks on a multi-talker database of isolated letters.
Signing for the deaf using virtual humans
, 2000
"... predominantly by the Independent Television Commission and more recently by the UK Post Office also, has investigated the feasibility of using virtual signing as a communication medium for presenting information to the Deaf. We describe and demonstrate the underlying virtual signer technology, and d ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
predominantly by the Independent Television Commission and more recently by the UK Post Office also, has investigated the feasibility of using virtual signing as a communication medium for presenting information to the Deaf. We describe and demonstrate the underlying virtual signer technology, and discuss the language processing techniques and discourse models which have been investigated for information communication in a transaction application in Post Offices, and for presentation of more general textual material in texts such as sub titles accompanying television programmes. 1 Background. Recent advances in multi-media technology have lead to an increased interest in virtual humans. Rendered off-line, they are regularly used in the entertainment industry. In addition standards are emerging for driving moving virtual humans over networks [ 1, lo]. In particular, MPEG-4 (Version 2) provides two alternatives, “The Body ” [4] and, through the adoption of VRML as a multimedia object, H-anim [8]. To deliver readable sign language the virtual human has to present movements, gestures and expressions
Visual Speech: A Physiological Or Behavioural Biometric?
, 2001
"... This paper addresses an issue concerning the current classification of biometrics into either physiological or behavioural. We offer clarification on this issue and propose additional qualifications for a biometric to be classed as behavioural. It is observed that dynamics play a key role in the qu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper addresses an issue concerning the current classification of biometrics into either physiological or behavioural. We offer clarification on this issue and propose additional qualifications for a biometric to be classed as behavioural. It is observed that dynamics play a key role in the qualification of these terminologies. These are illustrated by practical experiments based around visual speech. Two sets of speaker recognition experiments are considered: the first uses lip profiles as both a physiological and a behavioural biometric, the second uses the inherent dynamics of visual speech to locate key facial features. Experimental results using short, consistent test and training segments from video recordings give recognition error rates as: physiological - lips 2% and face circles 11%; behavioural - lips 15% and voice 11%.
Visual Speech for Speaker Recognition and Robust Face Detection
, 2001
"... This thesis is in two parts. Part I considers 'Visual Speech for Speaker Recognition , and Part II considers Face DetectionL Both parts are connected via the human face. In Part I ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This thesis is in two parts. Part I considers 'Visual Speech for Speaker Recognition , and Part II considers Face DetectionL Both parts are connected via the human face. In Part I
ISCA Archive LIPREADING USING SHAPE, SHADING AND SCALE
"... This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM ..."
Abstract
- Add to MetaCart
(Show Context)
This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A nontracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA). This performs almost identically to AAM’s in both visual and audio-visual recognition tasks on a multi-talker database of isolated letters. 1.
unknown title
"... Towards a low bandwidth talking face using appearance models. ..."
(Show Context)