## Parametric Hidden Markov Models for Gesture Recognition (1999)

### Cached

### Download Links

Venue: | IEEE Transactions on Pattern Analysis and Machine Intelligence |

Citations: | 145 - 3 self |

### BibTeX

@ARTICLE{Wilson99parametrichidden,

author = {Andrew D. Wilson and Student Member and Ieee Computer Society and Aaron F. Bobick and Ieee Computer Society},

title = {Parametric Hidden Markov Models for Gesture Recognition},

journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},

year = {1999},

volume = {21},

pages = {884--900}

}

### Years of Citing Articles

### OpenURL

### Abstract

AbstractÐA new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Last, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction. Index TermsÐGesture recognition, hidden Markov models, expectation-maximization algorithm, time-series modeling, computer vision. 1

### Citations

4828 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...om sequence to sequence. When necessary, we write the value of associated with a particular sequence k as k . For readers familiar with graphical model representations of HMMs (for example, see [3]), Fig. 2 shows the PHMM architecture as a Bayes network. The diagram makes explicit the fact that the output nodes (labeled x t ) depend upon . Bengio and Frasconi's [2] Input Output HMM (IOHMM) is... |

958 |
Visual Learning and Recognition of 3-D Objects from Appearance
- Murase, Nayar
- 1995
(Show Context)
Citation Context ... the whole movement sequence. They apply their technique to show more robust recognition in the face of varying walking direction and style. They do not address parameter extraction. Murase and Nayar =-=[19]-=- parameterize meaningful variation in the appearance of images by computing a representation of the nonlinear manifold of the images in an eigenspace of the images. Their work is similar to ours in th... |

905 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...TRIC HIDDEN MARKOV MODELS FOR GESTURE RECOGNITION 893 3. Given the graphical model equivalent in Fig. 2, it is possible to exactly solve for the best value of using the standard inference algorithm [=-=1-=-6]. The computational complexity of that algorithm is equivalent to that of evaluating the likelihood of the model for all value of , where is discretized to some adequate precision. Particularly for... |

834 | A tutorial on hidden Markov models
- Rabiner, Juang
- 1989
(Show Context)
Citation Context ...er as noise. 3 PARAMETRIC HIDDEN MARKOV MODELS 3.1 Defining Parameterized Gesture Parametric HMMs explicitly model the dependence on the parameter of interest. We begin with the usual HMM formulation =-=[22]-=- and change the form of the output probability distribution (usually a normal distribution or a mixture model) to depend on the gesture parameter to be estimated. As in previous approaches to gesture ... |

429 |
Hand and Mind: What Gestures Reveal About Thought
- McNeill
- 1992
(Show Context)
Citation Context ...we employed HMMs that model the temporal properties of movement to recognize two broad classes of natural, spontaneous gesture. These models were constructed in accordance with natural gesture theory =-=[18]-=-, [11]. Campbell and Bobick [10] search for orthogonal projections of the feature space to find the most diagnostic projections in order to classify ballet steps. In each of these cases, the goal is t... |

406 | Maximum Likelihood Linear Transformations For HMM-Based Speech Recognition
- Gales
- 1998
(Show Context)
Citation Context ... is given for the training set. Last, we mention that, in the speech recognition community, a number of models for speaker adaptation in HMM-based speech recognition systems have been proposed. Gales =-=[14]-=- for example, examines a number of transformations on the means and covariances of HMM output distributions. These transformations are trained against a new speaker speaking a known utterance. Our mod... |

293 |
Recognizing human action in time-sequential images using hidden Markov model
- Yamato, Ohya, et al.
- 1992
(Show Context)
Citation Context ...ned the training set. HMMs forego the construction of a prototype in exchange for an expectation/maximization method of determining a stochastic sequence of states to represent gesture. Yamato et al. =-=[32]-=- first used HMMs in vision to recognize tennis strokes. Schlenzig et al. [23] used HMMs and a rotation-invariant image representation to recognize hand gestures from video. Starner and Pentland [24] a... |

279 | Visual recognition of American Sign Language using Hidden Markov models
- Starner, Pentland
- 1995
(Show Context)
Citation Context .... [32] first used HMMs in vision to recognize tennis strokes. Schlenzig et al. [23] used HMMs and a rotation-invariant image representation to recognize hand gestures from video. Starner and Pentland =-=[24]-=- applied HMMs to recognize ASL sentences, and Campbell et al. [9] used HMMs to recognize Tai Chi movements. The present work is based on the HMM framework, which we summarize in the appendix. None of ... |

167 | Parameterized modeling and recognition of activities
- Yacoob, Black
- 1999
(Show Context)
Citation Context ... and relaxed). The distance between the hands conveys the size of the fish. earlier work in that the goal is to recover a parameterization of the systematic variation of the gesture. Yacoob and Black =-=[31]-=-, as well as Bobick and Davis [6], model the variation within a class of human movement using linear principal components analysis. The space of variation is defined by a single linear transformation ... |

125 | Recognition of human body motion using phase space constraints
- Campbell, Bobick
- 1995
(Show Context)
Citation Context ...temporal properties of movement to recognize two broad classes of natural, spontaneous gesture. These models were constructed in accordance with natural gesture theory [18], [11]. Campbell and Bobick =-=[10]-=- search for orthogonal projections of the feature space to find the most diagnostic projections in order to classify ballet steps. In each of these cases, the goal is to eliminate the systematic varia... |

97 | A.: "Real-Time Self-Calibrating Stereo Person Tracking Using 3D Shape Estimation from Blob Features
- Azarbayejani, Pentland
(Show Context)
Citation Context ...ize Gesture To test the ability of the parametric HMM to learn the parameterization, 30 examples of the type depicted in Fig. 1 were collected using the Stereo Interactive Virtual Environment (STIVE) =-=[1]-=-, a research computer vision system utilizing wide baseline stereo cameras and flesh tracking (see Fig. 3). STIVE is able to compute the three-dimensional position of the head and hands at a frame rat... |

82 | A state-based approach to the representation and recognition of gesture
- Bobick, Wilson
- 1997
(Show Context)
Citation Context ...re two techniques based on dynamic programming. Darrell and Pentland [12] applied DTW to match image template correlation scores against models to recognize hand gestures from video. In previous work =-=[5]-=-, we represented gesture as a deterministic sequence of states through some configuration or feature space and employed a DTW parsing algorithm to recognize the gestures. The states were found by firs... |

71 | Surface learning with applications to lip-reading
- Bregler, Omohundro
- 1994
(Show Context)
Citation Context ...ty of approaches to learning a nonlinear manifold in some feature space representing systematic variation. One of these techniques has been applied to the task of lip reading by Bregler and Omohundro =-=[7]-=-. Bishop et al. [4] have also introduced techniques to learn latent parameterizations. Their system begins with an assumption of the dimensionality of the parameterization and uses an expectationmaxim... |

65 | Invariant features for 3-D gesture recognition
- Campbell, Becker, et al.
- 1996
(Show Context)
Citation Context ...enzig et al. [23] used HMMs and a rotation-invariant image representation to recognize hand gestures from video. Starner and Pentland [24] applied HMMs to recognize ASL sentences, and Campbell et al. =-=[9]-=- used HMMs to recognize Tai Chi movements. The present work is based on the HMM framework, which we summarize in the appendix. None of the approaches mentioned above consider the effect of a systemati... |

59 | Recognition and interpretation of parametric gesture - Wilson, Bobick - 1998 |

58 | An Appearance-Based Representation of Action
- Aaron, Davis
- 1996
(Show Context)
Citation Context ...n the hands conveys the size of the fish. earlier work in that the goal is to recover a parameterization of the systematic variation of the gesture. Yacoob and Black [31], as well as Bobick and Davis =-=[6]-=-, model the variation within a class of human movement using linear principal components analysis. The space of variation is defined by a single linear transformation on the whole movement sequence. T... |

58 | A novel environment for situated vision and behavior
- Darrel, Maes, et al.
- 1994
(Show Context)
Citation Context ...re to recover the parameter: Wait until the hands are in the middle of the gesture space and have low velocity, then calculate the distance between the hands. Similar approaches are used in the ALIVE =-=[13]-=- and Perseus [17] systems. The typical approach of these systems is to first identify static configurations of the user's body that are diagnostic of the gesture and, then, use an unrelated method to ... |

54 | Learning visual behavior for gesture analysis
- Wilson, Bobick
- 1995
(Show Context)
Citation Context ... example, is subject to complex grammatical processes that operate on multiple simultaneous levels [21]. One approach is to explicitly model the space of variation exhibited by a class of signals. In =-=[27]-=-, we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis set of the images at each state. An image's membership to each state is a function of the residual o... |

43 | Separating Style and Content
- Tenenbaum, Freeman
- 1997
(Show Context)
Citation Context ...xtract some component of the style of the movement. Second, the parameterized technique presented is domain -independent and is applicable to any sequence parsing problem where some context or style (=-=[2-=-5]) spans an entire sequence. The PHMM framework has been generalized to handle nonlinear dependencies of the state output distributions on the parameterization . We have shown that, where the linear ... |

24 | Understanding people pointing: The Perseus system
- Kahn, Swain
- 1995
(Show Context)
Citation Context ... parameter: Wait until the hands are in the middle of the gesture space and have low velocity, then calculate the distance between the hands. Similar approaches are used in the ALIVE [13] and Perseus =-=[17]-=- systems. The typical approach of these systems is to first identify static configurations of the user's body that are diagnostic of the gesture and, then, use an unrelated method to extract the param... |

21 |
ªTemporal Classification of Natural Gesture and Application to Video Coding,º
- Wilson, Bobick, et al.
- 1997
(Show Context)
Citation Context ...e variation between instances is treated as noise. When it is too difficult to approximate the noise or the noise is systematic, it is often effective to look for diagnostic features. For example, in =-=[30]-=-, we employed HMMs that model the temporal properties of movement to recognize two broad classes of natural, spontaneous gesture. These models were constructed in accordance with natural gesture theor... |

20 |
Vision based hand gesture interpretation using recursive estimation
- Schlenzig, Hunter, et al.
- 1994
(Show Context)
Citation Context ... for an expectation/maximization method of determining a stochastic sequence of states to represent gesture. Yamato et al. [32] first used HMMs in vision to recognize tennis strokes. Schlenzig et al. =-=[23]-=- used HMMs and a rotation-invariant image representation to recognize hand gestures from video. Starner and Pentland [24] applied HMMs to recognize ASL sentences, and Campbell et al. [9] used HMMs to ... |

17 | optimization of latent-variable density models
- Bishop, Svensén, et al.
- 1996
(Show Context)
Citation Context ... learning a nonlinear manifold in some feature space representing systematic variation. One of these techniques has been applied to the task of lip reading by Bregler and Omohundro [7]. Bishop et al. =-=[4]-=- have also introduced techniques to learn latent parameterizations. Their system begins with an assumption of the dimensionality of the parameterization and uses an expectationmaximization framework t... |

14 | Nonlinear phmms for the interpretation of parameterized gesture - Wilson, Bobick - 1998 |

11 |
ªVisual Learning and Recognition of 3-D Objects fromOppearance,º Int'l
- Murase, Nayar
- 1995
(Show Context)
Citation Context ... the whole movement sequence. They apply their technique to show more robust recognition in the face of varying walking direction and style. They do not address parameter extraction. Murase and Nayar =-=[19]-=- parameterize meaningful variation in the appearance of images by computing a representation of the nonlinear manifold of the images in an eigenspace of the images. Their work is similar to ours in th... |

10 | Gesture and the poetics of prose
- Cassell, McNeill
- 1991
(Show Context)
Citation Context ...loyed HMMs that model the temporal properties of movement to recognize two broad classes of natural, spontaneous gesture. These models were constructed in accordance with natural gesture theory [18], =-=[11]-=-. Campbell and Bobick [10] search for orthogonal projections of the feature space to find the most diagnostic projections in order to classify ballet steps. In each of these cases, the goal is to elim... |

7 |
Motion analysis of grammatical processes in a visual gestural language
- Poizner, Klima, et al.
- 1983
(Show Context)
Citation Context ...d. In human communication, sometimes how a gesture is performed carries significant meaning. ASL, for example, is subject to complex grammatical processes that operate on multiple simultaneous levels =-=[21]-=-. One approach is to explicitly model the space of variation exhibited by a class of signals. In [27], we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis... |

4 |
ªSpace-Time Gestures,º
- Darrell, Pentland
- 1993
(Show Context)
Citation Context ...g times to give the best match average over the length of the gesture. Dynamic time warping (DTW) and Hidden Markov models (HMMs) are two techniques based on dynamic programming. Darrell and Pentland =-=[12]-=- applied DTW to match image template correlation scores against models to recognize hand gestures from video. In previous work [5], we represented gesture as a deterministic sequence of states through... |

3 |
ªAdaptive
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...ace. In learning, the problem of allocating training examples labeled by a continuous variable to one of a discrete set of models is eliminated by uniting the models in a mixture of experts framework =-=[15]-=-. In testing, the parameter is extracted by finding the best match among the models and looking up its associated parameter value. The dependency of the movement's form on the parameter is thus remove... |

2 |
ªAn Input Output HMM
- Bengio, Frasconi
- 1995
(Show Context)
Citation Context ...s of HMMs (for example, see [3]), Fig. 2 shows the PHMM architecture as a Bayes network. The diagram makes explicit the fact that the output nodes (labeled x t ) depend upon . Bengio and Frasconi's [2=-=]-=- Input Output HMM (IOHMM) is a similar architecture that maps input sequences to output sequences using a recurrent neural net, which, by the Markov assumption, need only consider the current and prev... |

2 |
ªAn Appearance-Based Representation of Action,º
- Bobick, Davis
- 1996
(Show Context)
Citation Context ...ELLIGENCE, VOL. 21, NO. 9, SEPTEMBER 1999 earlier work in that the goal is to recover a parameterization of the systematic variation of the gesture. Yacoob and Black [31], as well as Bobick and Davis =-=[6]-=-, model the variation within a class of human movement using linear principal components analysis. The space of variation is defined by a single linear transformation on the whole movement sequence. T... |

2 |
ªSurface Learning with Applications to Lipreading,º
- Bregler, Omohundro
- 1994
(Show Context)
Citation Context ...ty of approaches to learning a nonlinear manifold in some feature space representing systematic variation. One of these techniques has been applied to the task of lip reading by Bregler and Omohundro =-=[7]-=-. Bishop et al. [4] have also introduced techniques to learn latent parameterizations. Their system begins with an assumption of the dimensionality of the parameterization and uses an expectationmaxim... |

2 |
ªSeparating Style and
- Tenenbaum, Freeman
- 1997
(Show Context)
Citation Context ...extract some component of the style of the movement. Second, the parameterized technique presented is domain-independent and is applicable to any sequence parsing problem where some context or style (=-=[25]-=-) spans an entire sequence. The PHMM framework has been generalized to handle nonlinear dependencies of the state output distributions on the parameterization . We have shown that, where the linear PH... |

2 |
ªLearning Visual Behavior for Gesture Analysis,º Proc
- Wilson, Bobick
- 1995
(Show Context)
Citation Context ... example, is subject to complex grammatical processes that operate on multiple simultaneous levels [21]. One approach is to explicitly model the space of variation exhibited by a class of signals. In =-=[27]-=-, we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis set of the images at each state. An image's membership to each state is a function of the residual o... |

2 |
ªParameterized Modeling and Recognition of Activities,º Computer Vision and
- Yacoob, Black
- 1999
(Show Context)
Citation Context ...N PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 9, SEPTEMBER 1999 earlier work in that the goal is to recover a parameterization of the systematic variation of the gesture. Yacoob and Black =-=[31]-=-, as well as Bobick and Davis [6], model the variation within a class of human movement using linear principal components analysis. The space of variation is defined by a single linear transformation ... |

1 |
ªEM Optimization of Latent-Variable Density Models,º
- Bishop, Svensen, et al.
- 1996
(Show Context)
Citation Context ... learning a nonlinear manifold in some feature space representing systematic variation. One of these techniques has been applied to the task of lip reading by Bregler and Omohundro [7]. Bishop et al. =-=[4]-=- have also introduced techniques to learn latent parameterizations. Their system begins with an assumption of the dimensionality of the parameterization and uses an expectationmaximization framework t... |

1 |
ªInvariant Features for 3-D Gesture Recognition,º
- Campbell, Becker, et al.
- 1996
(Show Context)
Citation Context ...enzig et al. [23] used HMMs and a rotation-invariant image representation to recognize hand gestures from video. Starner and Pentland [24] applied HMMs to recognize ASL sentences, and Campbell et al. =-=[9]-=- used HMMs to recognize Tai Chi movements. The present work is based on the HMM framework, which we summarize in the appendix. None of the approaches mentioned above consider the effect of a systemati... |

1 |
ªGesture and the Poetics of Prose,º
- Cassell, McNeill
- 1991
(Show Context)
Citation Context ...loyed HMMs that model the temporal properties of movement to recognize two broad classes of natural, spontaneous gesture. These models were constructed in accordance with natural gesture theory [18], =-=[11]-=-. Campbell and Bobick [10] search for orthogonal projections of the feature space to find the most diagnostic projections in order to classify ballet steps. In each of these cases, the goal is to elim... |

1 |
ªUnderstanding People Pointing: The Perseus System,º Proc
- Kahn, Swain
- 1995
(Show Context)
Citation Context ... parameter: Wait until the hands are in the middle of the gesture space and have low velocity, then calculate the distance between the hands. Similar approaches are used in the ALIVE [13] and Perseus =-=[17]-=- systems. The typical approach of these systems is to first identify static configurations of the user's body that are diagnostic of the gesture and, then, use an unrelated method to extract the param... |

1 |
ªMotion Analysis of Grammatical
- Poizner, Klima, et al.
- 1983
(Show Context)
Citation Context ...d. In human communication, sometimes how a gesture is performed carries significant meaning. ASL, for example, is subject to complex grammatical processes that operate on multiple simultaneous levels =-=[21]-=-. One approach is to explicitly model the space of variation exhibited by a class of signals. In [27], we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis... |

1 |
ªAn Introduction to Hidden Markov Models,º
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...OGNITION 887 3 PARAMETRIC HIDDEN MARKOV MODELS 3.1 Defining Parameterized Gesture Parametric HMMs explicitly model the dependence on the parameter of interest. We begin with the usual HMM formulation =-=[22]-=- and change the form of the output probability distribution (usually a normal distribution or a mixture model) to depend on the gesture parameter to be estimated. As in previous approaches to gesture ... |

1 | ªNonlinear PHMMs for the Interpretation of Parameterized Gesture,º - Wilson, Bobick - 1998 |

1 | ªRecognition and Interpretation of Parametric Gesture,º - Wilson, Bobick - 1998 |