Results 1 -
7 of
7
Extraction of Visual Features for Lipreading
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is de ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is degraded. This paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three meth-ods for parameterising lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape, or shape and appearance respectively. The third, bottom-up, method uses a non-linear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multi-talker visual speech recognition task of isolated letters.
Signing for the deaf using virtual humans
- In IEE Colloquium on Speech and Language Processing
, 2000
"... Research at Televirtual (Norwich) and the University of East Anglia, funded predominantly by the Independent Television Commission and more recently by the UK Post Office also, has investigated the feasibility of using virtual signing as a communication medium for presenting information to the Deaf. ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Research at Televirtual (Norwich) and the University of East Anglia, funded predominantly by the Independent Television Commission and more recently by the UK Post Office also, has investigated the feasibility of using virtual signing as a communication medium for presenting information to the Deaf. We describe and demonstrate the underlying virtual signer technology, and discuss the language processing techniques and discourse models which have been investigated for information communication in a transaction application in Post Offices, and for presentation of more general textual material in texts such as subtitles accompanying television programmes. 1 Background. Recent advances in multi-media technology have lead to an increased interest in virtual humans. Rendered off-line, they are regularly used in the entertainment industry. In addition standards are emerging for driving moving virtual humans over networks [1, 10]. In particular, MPEG-4 (Version 2) provides two alternatives, “The Body ” [4] and,
Visual Speech: A Physiological Or Behavioural Biometric?
, 2001
"... This paper addresses an issue concerning the current classification of biometrics into either physiological or behavioural. We offer clarification on this issue and propose additional qualifications for a biometric to be classed as behavioural. It is observed that dynamics play a key role in the qu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper addresses an issue concerning the current classification of biometrics into either physiological or behavioural. We offer clarification on this issue and propose additional qualifications for a biometric to be classed as behavioural. It is observed that dynamics play a key role in the qualification of these terminologies. These are illustrated by practical experiments based around visual speech. Two sets of speaker recognition experiments are considered: the first uses lip profiles as both a physiological and a behavioural biometric, the second uses the inherent dynamics of visual speech to locate key facial features. Experimental results using short, consistent test and training segments from video recordings give recognition error rates as: physiological - lips 2% and face circles 11%; behavioural - lips 15% and voice 11%.
Facial Analysis and Synthesis
- Vrije Universiteit Brussel, Dept
, 2006
"... To my son to remind me of my dreams; to my husband to support me in pursuing my dreams; to my mother to guide me towards my dreams; to my family and friends to tell me to believe in my dreams; to my colleagues to help me realize my dreams on professional level. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
To my son to remind me of my dreams; to my husband to support me in pursuing my dreams; to my mother to guide me towards my dreams; to my family and friends to tell me to believe in my dreams; to my colleagues to help me realize my dreams on professional level.
Towards a Low Bandwidth Talking Face Using Appearance Models
, 2001
"... The paper is motivated by the need to develop low bandwidth virtual humans capable of delivering audio-visual speech and sign language at a quality comparable to high bandwidth video. The number of bits required for animating a virtual human is significantly reduced by using an appearance model c ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The paper is motivated by the need to develop low bandwidth virtual humans capable of delivering audio-visual speech and sign language at a quality comparable to high bandwidth video. The number of bits required for animating a virtual human is significantly reduced by using an appearance model combined with parameter compression. A new perceptual method is introduced and used to evaluate the quality of the synthesised sequences. It appears that 3.6 kbits.s can still yield acceptable quality.
Visual Speech for Speaker Recognition and Robust Face Detection
"... This thesis is in two parts. Part I considers 'Visual Speech for Speaker Recognition , and Part II considers Face DetectionL Both parts are connected via the human face. ..."
Abstract
- Add to MetaCart
This thesis is in two parts. Part I considers 'Visual Speech for Speaker Recognition , and Part II considers Face DetectionL Both parts are connected via the human face.
FANELLI et al.: HOUGH TRANSFORM-BASED MOUTH LOCALIZATION FOR AVSR 1 Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition
"... We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, an ..."
Abstract
- Add to MetaCart
We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets. 1

