Results 1 - 10
of
62
Multi-Modal Tracking of Faces for Video Communications
, 1997
"... This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a ..."
Abstract
-
Cited by 92 (24 self)
- Add to MetaCart
This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a confidence factor which accompanies each observation. Fusion of results into a unified estimation for tracking is made possible by estimating a covariance matrix with each observation. Visual processes for face tracking are described using blink detection, normalised color histogram matching, and cross correlation (SSD and NCC). Ensembles of visual processes are organised into processing states so as to provide robust tracking. Transition between states is determined by events detected by processes. The result of face detection is fed into recursive estimator (Kalman filter). The output from the estimator drives a PD controller for a pan/tilt/zoom camera. The resulting system provides robus...
Feature-Based Human Face Detection
- IMAGE AND VISION COMPUTING
, 1996
"... Human face detection has always been an important problem for face, expression and gesture recognition. Though numerous attempts have been made to detect and localize faces, these approaches have made assumptions that restrict their extension to more general cases. We identify that the key factor in ..."
Abstract
-
Cited by 66 (3 self)
- Add to MetaCart
Human face detection has always been an important problem for face, expression and gesture recognition. Though numerous attempts have been made to detect and localize faces, these approaches have made assumptions that restrict their extension to more general cases. We identify that the key factor in a generic and robust system is that of using a large amount of image evidence, related and reinforced by model knowledge through a probabilistic framework. In this paper, we propose a featurebased algorithm for detecting faces that is sufficiently generic and is also easily extensible to cope with more demanding variations of the imaging conditions. The algorithm detects feature points from the image using spatial filters and groups them into face candidates using geometric and gray level constraints. A probabilistic framework is then used to reinforce probabilities and to evaluate the likelihood of the candidate as a face. We provide results to support the validity of the approach and demo...
Robust Face Tracking using Color
- 4th IEEE International Conference on Automatic Face and Gesture Recognition
, 2000
"... In this paper we discuss a new robust tracking technique applied to histograms of intensity normalized color. This technique supports a video codec based on orthonormal basis coding. Orthonormal basis coding can be very efficient when the images to be coded have been normalized in size and position. ..."
Abstract
-
Cited by 49 (14 self)
- Add to MetaCart
In this paper we discuss a new robust tracking technique applied to histograms of intensity normalized color. This technique supports a video codec based on orthonormal basis coding. Orthonormal basis coding can be very efficient when the images to be coded have been normalized in size and position. However, an imprecise tracking procedure can have a negative impact on the efficiency and the quality of reconstruction of this technique, since it may increase the size of the required basis space. The face tracking procedure described in this paper has certain advantages, such as greater stability, higher precision, and less jitter, over conventional tracking techniques using color histograms. In addition to those advantages, the features of the tracked object such as mean and variance are mathematically describable.
LAFTER: Lips and Face Real Time Tracker
, 1997
"... This paper describes an active-camera real-time system for tracking, shape description, and classification of the human face and mouth using only an SGI Indy computer. The system is based on use of 2-D blob features, which are spatially-compact clusters of pixels that are similar in terms of low-lev ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
This paper describes an active-camera real-time system for tracking, shape description, and classification of the human face and mouth using only an SGI Indy computer. The system is based on use of 2-D blob features, which are spatially-compact clusters of pixels that are similar in terms of low-level image properties. Patterns of behavior (e.g., facial expressions and head movements) can be classified in real-time using Hidden Markov Model (HMM) methods. The system has been tested on hundreds of users and has demonstrated extremely reliable and accurate performance. Typical classification accuracies are near 100%. 1. Introduction This paper describes a real-time system for accurate tracking and shape description, and classification of the human face and mouth using 2-D blob features and Hidden Markov Models (HMMs). All of the experimental apparatus described here is real-time, at 20 to 30 frames per second, and runs on SGI Indy workstations without any special-purpose hardware. The n...
Colour Model Selection and Adaptation in Dynamic Scenes
, 1998
"... . We use colour mixture models for real-time colour-based object localisation, tracking and segmentation in dynamic scenes. Within such a framework, we address the issues of model order selection, modelling scene background and model adaptation in time. Experimental results are given to demonstrate ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
. We use colour mixture models for real-time colour-based object localisation, tracking and segmentation in dynamic scenes. Within such a framework, we address the issues of model order selection, modelling scene background and model adaptation in time. Experimental results are given to demonstrate our approach in different scale and lighting conditions. 1 Introduction Colour has been used in machine vision for tasks such as segmentation [1, 2], tracking [3] and recognition [4, 5]. Colour offers many advantages over geometric information in dynamic vision such as robustness under partial occlusion, rotation in depth, scale changes and resolution changes. Furthermore, using colour enables real-time performance on modest hardware platforms [1]. Swain and Ballard [5] described a scheme which used histograms for modelling the colours of an object. The colour space was quantised through the histogram's structure which comprised a number of "bins". An algorithm known as "histogram intersect...
Things That See
- Communications of the ACM
, 2000
"... nvergence and ubiquity. At the same time, inexpensive computing power is enabling a quiet revolution in the machine perception of human action. In the near future, we expect machine perception to converge with ubiquitous computing and communication. Exploring machine vision for human-computer inter ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
nvergence and ubiquity. At the same time, inexpensive computing power is enabling a quiet revolution in the machine perception of human action. In the near future, we expect machine perception to converge with ubiquitous computing and communication. Exploring machine vision for human-computer interaction. THINGS THAT SEE COMMUNICA 0 A OF THE AE March 2000/V4 43, No. 3 55 PUI Figure 1. Interacting with the Magic Board (iihm.imag.fr/demos/magicboard/). Physical whiteboard Workstation Video projector Video camera (a) The apparatus of the Magic Board; (b) Selecting a physical drawing with the finger; (c) Copying the selected drawing; (d) Completing the drawing with physical markers; (e) The menu at the top of the physical board to facilitate reinitialization. a c b d e What Can Machine Vision Do For You? Machine vision is the observation of an environment using cameras. It differs from image
Head Pose Estimation in Computer Vision: A Survey
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2008
"... The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has fewer rigorously evaluated systems or generic solutions. In this paper, we discuss the inherent difficulties in head pose estimation and present an organized survey describing the evolution of the field. Our discussion focuses on the advantages and disadvantages of each approach and spans 90 of the most innovative and characteristic papers that have been published on this topic. We compare these systems by focusing on their ability to estimate coarse and fine head pose, highlighting approaches that are well suited for unconstrained environments.
Comprehensive Colour Image Normalization
, 1998
"... . The same scene viewed under two different illuminants induces two different colour images. If the two illuminants are the same colour but are placed at different positions then corresponding rgb pixels are related by simple scale factors. In contrast if the lighting geometry is held fixed but the ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
. The same scene viewed under two different illuminants induces two different colour images. If the two illuminants are the same colour but are placed at different positions then corresponding rgb pixels are related by simple scale factors. In contrast if the lighting geometry is held fixed but the colour of the light changes then it is the individual colour channels (e.g. all the red pixel values or all the green pixels) that are a scaling apart. It is well known that the image dependencies due to lighting geometry and illuminant colour can be respectively removed by normalizing the magnitude of the rgb pixel triplets (e.g. by calculating chromaticities) and by normalizing the lengths of each colour channel (by running the `grey-world' colour constancy algorithm). However, neither normalization suffices to account for changes in both the lighting geometry and illuminant colour. In this paper we present a new comprehensive image normalization which removes image dependency on lighting...
Modeling Focus of Attention for Meeting Indexing Based on Multiple Cues
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2002
"... A user's focus of attention plays an important role in human--computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. In this paper, we are interested in modeling people's focus of attent ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
A user's focus of attention plays an important role in human--computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. In this paper, we are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have developed a system to estimate participants' focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately. The system then combines the output of the audioand video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% relative error reduction on average compared to only using one modality. The focus of attention model can be used as an index for a multimedia meeting record. It can also be used for analyzing a meeting.

