Results 1 - 10
of
26
Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. We survey the literature on vision-based hand gestur ..."
Abstract
-
Cited by 276 (17 self)
- Add to MetaCart
The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. We survey the literature on vision-based hand gesture recognition within the context of its role in HCI. The number of approaches to video-based hand gesture recognition has grown in recent years. Thus, the need for systematization and analysis of different aspects of gestural interaction has developed. We discuss a complete model of hand gestures that possesses both spatial and dynamic properties of human hand gestures and can accommodate for all their natural types. Two classes of models that have been employed for interpretation of hand gestures for HCI are considered. The first utilizes 3D models of the human hand, while the second relies on the appearance of the human hand in the image. Investigation of model parameters and analysis feat...
Recognition without Correspondence using Multidimensional Receptive Field Histograms
- International Journal of Computer Vision
, 2000
"... . The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represente ..."
Abstract
-
Cited by 176 (15 self)
- Add to MetaCart
. The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time. 1. Introduction The paper proposes a framework for the statistical representation of the appearance of arbitrary 3D objects. This representation consists of a probability density function or jo...
Face Detection With Information-Based Maximum Discrimination
- In Computer Vision and Pattern Recognition
, 1997
"... In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challen ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challenging problem. We use a family of discrete Markov processes to model the face and background patterns and estimate the probability models using the data statistics. Then, we convert the learning process into an optimization, selecting the Markov process that optimizes the information-based discrimination between the two classes. The detection process is carried out by computing the likelihood ratio using the probability model obtained from the learning procedure. We show that because of the discrete nature of these models, the detection process is, by almost two orders of magnitude, less computationally expensive than neural network approaches. However, no improvement in terms of correct-answ...
Tracking Faces
- In Proceedings of International Conference on Automatic Face & Gesture Recognition
, 1996
"... Robust tracking and segmentation of faces is a prerequisite for face analysis and recognition. In this paper, we describe an approach to this problem which is well suited to surveillance applications with poorly constrained viewing conditions. It integrates motion-based tracking with modelbased face ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
Robust tracking and segmentation of faces is a prerequisite for face analysis and recognition. In this paper, we describe an approach to this problem which is well suited to surveillance applications with poorly constrained viewing conditions. It integrates motion-based tracking with modelbased face detection to produce segmented face sequences from complex scenes containing several people. The motion of moving image contours was estimated using temporal convolution and a temporally consistent list of moving objects was maintained. Objects were tracked using Kalman filters. Faces were detected using a neural network. The essence of the system is that the motion tracker is able to focus attention for a face detection network whilst the latter is used to aid the tracking process. 1 Introduction In order to analyse and recognise peoples' faces in realistically unconstrained environments, robust tracking and segmentation is a prerequisite. This provides a sequence of face images normalise...
Classification of Hand Postures Against Complex Backgrounds Using Elastic Graph Matching
, 2002
"... A system for person-independent classification of hand postures against complex backgrounds in video images is presented. The system employs elastic graph matching, which has already been successfully applied for object and face recognition. We use the bunch graph technique to model variance in hand ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A system for person-independent classification of hand postures against complex backgrounds in video images is presented. The system employs elastic graph matching, which has already been successfully applied for object and face recognition. We use the bunch graph technique to model variance in hand posture appearance between different subjects and variance in backgrounds. Our system does not need a separate segmentation stage but closely integrates finding the object boundaries with posture classification.
Appearance-Based Hand Sign Recognition from Intensity Image Sequences
, 2000
"... In this paper, we present a new approach to recognizing hand signs. In this approach, motion recognition (the hand movement) is tightly coupled with spatial recognition (hand shape). The system uses multiclass, multidimensional discriminant analysis to automatically select the most discriminating ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
In this paper, we present a new approach to recognizing hand signs. In this approach, motion recognition (the hand movement) is tightly coupled with spatial recognition (hand shape). The system uses multiclass, multidimensional discriminant analysis to automatically select the most discriminating linear features for gesture classification. A recursive partition tree approximator is proposed to do classification. This approach combined with our previous work on hand segmentation forms a new framework which addresses the three key aspects of hand sign interpretation: the hand shape, the location, and the movement. The framework has been tested to recognize 28 different hand signs. The experimental results show that the system achieved a 93.2% recognition rate for test sequences that have not been used in the training phase. It is shown that our approach provides better performance than the nearest neighbor classification in the eigen-subspace. 1 1 Introduction The ability to i...
Learning to Recognize Volcanoes on Venus
, 1998
"... Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian ima ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixellevel constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to "toy world" domains. This paper discusses important aspects of the application process not commonly encountered in the "toy world," including obtaining labeled training d...
Speech-Gesture Driven Multimodal Interfaces for Crisis Management
"... Emergency response requires strategic assessment of risks, decisions, and communications that are timecritical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enables tightly coordinated work. The access to this information by cr ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Emergency response requires strategic assessment of risks, decisions, and communications that are timecritical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enables tightly coordinated work. The access to this information by crisis management (CM) teams in emergency operations centers can be facilitated through various humancomputer interfaces. Unfortunately these interfaces are hard to use, require extensive training, and often impede rather than support teamwork. Dialogue-enabled devices, based on natural, multimodal interfaces have the potential of making a variety of information technology tools accessible during crisis management. This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialog-enabled interfaces for CM. The paper
Real-Time Input of 3D Pose and Gestures of a User's Hand and Its Applications for HCI
- Proc. 2001 IEEE Virtual Reality Conference
, 2001
"... In this paper, we introduce a method for tracking a user's hand in 3D and recognizing the hand's gesture in real-time without the use of any invasive devices attached to the hand. Our method uses multiple cameras for determining the position and orientation of a user's hand moving freely in a 3D spa ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper, we introduce a method for tracking a user's hand in 3D and recognizing the hand's gesture in real-time without the use of any invasive devices attached to the hand. Our method uses multiple cameras for determining the position and orientation of a user's hand moving freely in a 3D space. In addition, the method identifies predetermined gestures in a fast and robust manner by using a neural network which has been properly trained beforehand. This paper also describes results of user study of our proposed method and its application for several types of applications, including 3D object handling for a desktop system and 3D walk-through for a large immersive display system. 1.
Generalized likelihood ratio-based face detection and extraction of mouth features
- Pattern Recognition Letters
, 1997
"... isy.liu.se davoine,haibo,robert¥ Abstract. In this paper we describe a system to reliably localize the position of the speaker’s face and mouth in videophone sequences. A statistical scheme based on a subspace method is presented for detecting human faces under varying poses. We propose a new matchi ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
isy.liu.se davoine,haibo,robert¥ Abstract. In this paper we describe a system to reliably localize the position of the speaker’s face and mouth in videophone sequences. A statistical scheme based on a subspace method is presented for detecting human faces under varying poses. We propose a new matching criterion based on the Generalized Likelihood Ratio. The criterion is optimized efficiently with respect to similarity, affine or perspective transform parameters using a coarse-to-fine search strategy combined with a simulated annealing algorithm. Moreover we propose to extract a vector of geometrical features (four points) on the outline of the mouth. The extraction consists in analyzing amplitude projections in the regions of the mouth. All the computations are performed on H263-coded frames, with a QCIF spatial resolution. To this end, we propose algorithms adapted to the poor quality of the images and suited to a further real-time application. 1

