Results 1 - 10
of
31
Fast 2D hand tracking with flocks of features and multi-cue integration
- In IEEE Workshop on Real-Time Vision for Human-Computer Interaction (at CVPR
, 2004
"... This paper introduces “Flocks of Features, ” a fast tracking method for non-rigid and highly articulated objects such as hands. It combines KLT features and a learned foreground color distribution to facilitate 2D position tracking from a monocular view. The tracker’s benefits lie in its speed, its ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
(Show Context)
This paper introduces “Flocks of Features, ” a fast tracking method for non-rigid and highly articulated objects such as hands. It combines KLT features and a learned foreground color distribution to facilitate 2D position tracking from a monocular view. The tracker’s benefits lie in its speed, its robustness against background noise, and its ability to track objects that undergo arbitrary rotations and vast and rapid deformations. We demonstrate tracker performance on hand tracking with a non-stationary camera in unconstrained indoor and outdoor environments. The tracker yields over threefold improvement over a CamShift tracker in terms of the number of frames tracked before the target was lost, and often more than one order of magnitude improvement in terms of the fractions of particular test sequences tracked successfully. 1.
Social Signal Processing: State-of-the-art and future perspectives of an emerging domain
- IN PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
(Show Context)
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes aset of recommendations for enabling the development of the next generation of socially-aware computing.
Tracking using flocks of features, with application to assisted handwashing
- British Machine Vision Conference (BMVC
, 2006
"... This paper describes a method for tracking in the presence of distractors, changes in shape, and occlusions. An object is modeled as a flock of features describing its approximate shape. The flock’s dynamics keep it spatially localised and moving in concert, but also well distributed across the obje ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
This paper describes a method for tracking in the presence of distractors, changes in shape, and occlusions. An object is modeled as a flock of features describing its approximate shape. The flock’s dynamics keep it spatially localised and moving in concert, but also well distributed across the object being tracked. A recursive Bayesian estimation of the density of the object is approximated with a set of samples. The method is demonstrated on two simple examples, and is applied to an assistive system that tracks the hands and the towel during a handwashing task. 1
Kernel-based Recognition of Human Actions Using Spatiotemporal Salient Points
"... This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring the variation ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. We derive a suitable distance measure between the representations, which is based on the Chamfer distance, and we optimize this measure with respect to a number of temporal and scaling parameters. In this way we achieve invariance against scaling, while at the same time, we eliminate the temporal differences between the representations. We use Relevance Vector Machines (RVM) in order to address the classification problem. We propose new kernels for use by the RVM, which are specifically tailored to the proposed spatiotemporal salient point representation. The basis of these kernels is the optimized Chamfer distance of the previous step. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises. 1.
Classifying offensive sites based on image content
- In Computer Vision and Image Understanding
, 2004
"... ..."
Motion Divergence Fields for Dynamic Hand Gesture Recognition ∗
"... Although it is in general difficult to track articulated hand motion, exemplar-based approaches provide a robust solution for hand gesture recognition. Presumably, a rich set of dynamic hand gestures are needed for a meaningful recognition system. How to build the visual representation for the motio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Although it is in general difficult to track articulated hand motion, exemplar-based approaches provide a robust solution for hand gesture recognition. Presumably, a rich set of dynamic hand gestures are needed for a meaningful recognition system. How to build the visual representation for the motion patterns is the key for scalable recognition. We propose a novel representation based on the divergence map of the gestural motion field, which transforms motion patterns into spatial patterns. Given the motion divergence maps, we leverage modern image feature detectors to extract salient spatial patterns, such as Maximum Stable Extremal Regions (MSER). A local descriptor is extracted from each region to capture the local motion pattern. The descriptors from gesture exemplars are subsequently indexed using a pre-trained vocabulary tree. New gestures are then matched efficiently with the database gestures with a TF-IDF scheme. Our extensive experiments on a large hand gesture database with 10 categories and 1050 video samples validate the efficacy of the extracted motion patterns for gesture recognition. The proposed approach achieves an overall recognition rate of 97.62%, while the average recognition time is only 34.53 ms. 1.
Spatiotemporal Salient Points for Visual Recognition of Human Actions
- IEEE Trans. Systems, Man and Cybernetics Part B
, 2005
"... This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring the variation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. We introduce an appropriate distance metric between two collections of spatiotemporal salient points that is based on the Chamfer distance and an iterative linear time warping technique that deals with time expansion or time compression issues. We propose a classification scheme that is based on Relevance Vector Machines and on the proposed distance measure. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises.
Resolving Hand Over Face Occlusion
"... Abstract. This paper presents a method to segment the hand over complex backgrounds, such as the face. The similar colors and texture of the hand and face make the problem particularly challenging. Our method is based on the concept of an image force field. In this representation each individual ima ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a method to segment the hand over complex backgrounds, such as the face. The similar colors and texture of the hand and face make the problem particularly challenging. Our method is based on the concept of an image force field. In this representation each individual image location consists of a vector value which is a nonlinear combination of the remaining pixels in the image. We introduce and develop a novel physics based feature that is able to measure regional structure in the image thus avoiding the problem of local pixel based analysis, which break down under our conditions. The regional image structure changes in the occluded region during occlusion. Elsewhere the regional structure remains relatively constant. We model the regional image structure at all image locations over time using a Mixture of Gaussians (MoG) to detect the occluded region in the image. We have tested the method on a number of sequences demonstrating the versatility of the proposed approach. 1
A Vision-based Remote Control
"... This chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data different methods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it.