Results 1 - 10
of
35
Parametric Hidden Markov Models for Gesture Recognition
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1999
"... AbstractÐA new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Ou ..."
Abstract
-
Cited by 114 (3 self)
- Add to MetaCart
AbstractÐA new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Last, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction. Index TermsÐGesture recognition, hidden Markov models, expectation-maximization algorithm, time-series modeling, computer vision. 1
A state-based approach to the representation and recognition ofgesture
- IEEE Trans. Patt. Analy. and Mach. Intell
, 1997
"... Abstract—A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the repeatability and variability evidenced in a trainin ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
Abstract—A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the repeatability and variability evidenced in a training set of example trajectories. Using techniques for computing a prototype trajectory of an ensemble of trajectories, we develop methods for defining configuration states along the prototype and for recognizing gestures from an unsegmented, continuous stream of sensor data. The approach is illustrated by application to a range of gesturerelated sensory data: the two-dimensional movements of a mouse input device, the movement of the hand measured by a magnetic spatial position and orientation sensor, and, lastly, the changing eigenvector projection coefficients computed from an image sequence.
Invariant features for 3-D gesture recognition
, 1996
"... Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures. Results indicate velocity features are superior to positional features, and partial rotational invariance ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures. Results indicate velocity features are superior to positional features, and partial rotational invariance is sufficient for good performance.
Recognition and interpretation of parametric gesture
- In International Conference on Computer Vision
, 1998
"... A new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a meaningful variation; one example is a point gesture where the important parameter is direction. Our approach is to extend the standard ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
A new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a meaningful variation; one example is a point gesture where the important parameter is direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the states of the HMM. Using a linear model to derive the theory, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, the parametric HMM simultaneously recognizes the gesture and estimates the quantifying parameters. Using visually-derived and directly measured 3-dimensional hand position measurements as input, we present results on two different movements — a size gesture and a point gesture — and show robustness with respect to noise in the input features. 1
Active Face Tracking and Pose Estimation in an Interactive Room
, 1996
"... We demonstrate real-time face tracking and pose estimation in an unconstrained office environment with an active foveated camera. Using vision routines previously implemented for an interactive environment, we determine the spatial location of a user's head and guide an active camera to obtain fovea ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
We demonstrate real-time face tracking and pose estimation in an unconstrained office environment with an active foveated camera. Using vision routines previously implemented for an interactive environment, we determine the spatial location of a user's head and guide an active camera to obtain foveated images of the face. Faces are analyzed using a set of eigenspaces indexed over both pose and world location. Closed loop feedback from the estimated facial location is used to guide the camera when a face is present in the foveated view. Our system can detect the head pose of an unconstrained user in real-time as he or she moves about an open room.
Recognizing Planned, Multiperson Action
- Computer Vision and Image Understanding
, 2001
"... This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goal-based primitives and low-order temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in mo ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goal-based primitives and low-order temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in model-based object recognition and probabilistic plan recognition, makes four principal assumptions: (1) the goals of individual agents are natural atomic representational units for specifying the temporal relationships between agents engaged in group activities, (2) a high-level description of temporal structure of the action using a small set of low-order temporal and logical constraints is adequate for representing the relationships between the agent goals for highly structured, multiagent action recognition, (3) Bayesian networks provide a suitable mechanism for integrating multiple sources of uncertain visual perceptual feature evidence, and (4) an automatically generated Bayesian
Movement, Activity, and Action: The Role of Knowledge in the Perception of Motion
- Royal Society Workshop on Knowledge-based Vision in Man and Machine
, 1997
"... We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no c ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no contextual or sequence knowledge to be recognized; movement is often addressed using either view- invariant or view specific geometric techniques. Activity refers to sequences of movements or states, where the only real knowledge required is the statistics of the sequence; much of the recent work in gesture understanding falls within this category of motion perception. Finally, actions are larger scale events which typically include interaction with the environment and causal relationships; action understanding straddles the gray division between perception and cognition, computer vision and artificial intelligence. We illustrate these levels with examples drawn mostly from our work in unders...
A Wearable Computer Based American Sign Language Recognizer
, 1997
"... Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wear ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wearable computer that can recognize (with the possible goal of translat- ing) sentence level American Sign Language (ASL) using only a baseball cap mounted camera for input. Current accuracy exceeds 97% per word on a 40 word lexicon.
Recognizing workshop activity using body worn microphones and accelerometers
- In Pervasive Computing
, 2004
"... Abstract. The paper presents a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors. The technique is based on a novel way of combining data from accelerometers with simple frequency matching sound classification. This includes the intensity analysis ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
Abstract. The paper presents a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors. The technique is based on a novel way of combining data from accelerometers with simple frequency matching sound classification. This includes the intensity analysis of signals from microphones at different body locations to correlate environmental sounds with user activity. To evaluate our method we apply it to activities in a wood shop. On a simulated assembly task our system can successfully segment and identify most shop activities in a continuous data stream with zero false positives and 84.4 % accuracy. 1
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.

