## Unsupervised learning of human motion (2003)

### Cached

### Download Links

- [www.vision.caltech.edu]
- [vision.caltech.edu]
- [www.vision.caltech.edu]
- [vision.caltech.edu]
- [www.vision.caltech.edu]
- [vision.caltech.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. PAMI |

Citations: | 73 - 1 self |

### BibTeX

@ARTICLE{Song03unsupervisedlearning,

author = {Yang Song and Luis Goncalves and Pietro Perona},

title = {Unsupervised learning of human motion},

journal = {IEEE Trans. PAMI},

year = {2003},

volume = {25},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful “foreground ” features as well as features that arise from irrelevant background clutter—the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences. Index Terms—Unsupervised learning, human motion, decomposable triangulated graph, probabilistic models, greedy search, EM algorithm, mixture models. 1

### Citations

8938 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...sumption that the priors P ðGÞ are equal for different decomposable triangulated graphs, then our goal is to find the structure G which can maximize P ðXjGÞ. By (2), PðXjGÞ can be computed as follows =-=[9]-=-, [7], [12], [20], [22]: log P ðXjGÞ XN n1 log P ðXnjGÞ XN XT 1 n1 t1 log PðXn AtjXn Bt ;Xn CtÞ þ log PðX n AT ;Xn BT ;Xn CT Þ ffi N XT 1 t1 N XT t1 ð5Þ hðXAtjXBt ;XCtÞ N hðXAT ;XBT;XCTÞ ð6... |

8835 | Introduction to algorithms - Cormen, Leiserson, et al. - 1990 |

8606 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...und Parts Observed This section develops an algorithm searching for the best decomposable triangulated model from unlabeled data, which is inspired by the idea of the expectation-maximization (or EM, =-=[10]-=-, [34]) algorithm. The algorithm we propose does not guarantee the same convergence properties as EM although it works well in practice. Assume that we have a data set of N samples XfX 1 ; X 2 ; ...;... |

5116 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...n is really dominant among all the possible labelings so that hard assignment for labelings can be used. This is similar to the situation of K-means versus mixture of Gaussian for clustering problems =-=[3]-=-. Note that the best labeling is used to update the parameters of the probability density function (mean and covariance under Gaussian assumption). Therefore, in case of several labelings with close l... |

954 |
An introduction to Bayesian networks
- Jensen
- 1996
(Show Context)
Citation Context ...phs and General Graphical Models For general graphical models, the labeling problem is the most-probable-configuration problem on the graph and can be solved through max-propagation on junction trees =-=[18]-=-, [21], [28]. The dynamic programming algorithm [2] and the max-propagation algorithm essentially have the same order of complexity which is determined by the maximum clique size of the graph. The max... |

672 | Approximating Discrete Probability Distributions with Dependence Trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...ed by a decomposable triangulated graph, is such that it allows efficient detection and labeling of the body. Structure learning of graphical models has been previously studied by a number of authors =-=[7]-=-, [12], [20], [22], [13]. The main contribution of this paper, apart from the specifics of the application, is that our method is unsupervised: it is based on unlabeled training data. The training seq... |

647 | Learning in Graphical Models
- Jordan, editor
- 1998
(Show Context)
Citation Context ...d General Graphical Models For general graphical models, the labeling problem is the most-probable-configuration problem on the graph and can be solved through max-propagation on junction trees [18], =-=[21]-=-, [28]. The dynamic programming algorithm [2] and the max-propagation algorithm essentially have the same order of complexity which is determined by the maximum clique size of the graph. The maximum c... |

612 | The visual analysis of human movement: A survey
- Gavrila
- 1999
(Show Context)
Citation Context ...ties, are important areas in computer vision with potential applications to medicine, entertainment, and security. To this end, a number of models of human motion have been proposed in the literature =-=[14]-=-. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearan... |

471 |
Visual perception of biological motion and a model for its analysis
- Johansson
- 1973
(Show Context)
Citation Context ...periments because they are easier to obtain compared to body segments that may be hard to detect in case of severe occlusion. Another reason is that psychophysics experiments (Johansson’s experiments =-=[19]-=-) show that the human visual system can perceive vivid human motion from moving dots representing the motion of the main joints of the human body. But, the algorithms can also be applied to other type... |

462 | Detection and tracking of point features
- Tomasi, Kanade
- 1991
(Show Context)
Citation Context ...m the specifics of the application, is that our method is unsupervised: it is based on unlabeled training data. The training sequence contains a number of bottom-up features (Tomasi and Kanade points =-=[30]-=- in our implementation) which are unlabeled, i.e., we do not know which features are associated to the body, which to background clutter, which features correspond to which features across image frame... |

292 | Unsupervised learning of models for recognition
- Weber, Welling, et al.
(Show Context)
Citation Context ...or each sample. If the labeling for each X n is taken as a hidden variable, then the idea of the EM algorithm can be used to learn the probability structure and parameters. Our method was inspired by =-=[32]-=-, but while they assumed a jointly Gaussian probability density function, here we learn the probabilistic independence structure. Let hn denote the labeling for X n . If X n contains nk features, then... |

215 | Being Bayesian about network structure: a Bayesian approach to structure discovery
- Friedman, Koller
- 2003
(Show Context)
Citation Context ...iangulated graph, is such that it allows efficient detection and labeling of the body. Structure learning of graphical models has been previously studied by a number of authors [7], [12], [20], [22], =-=[13]-=-. The main contribution of this paper, apart from the specifics of the application, is that our method is unsupervised: it is based on unlabeled training data. The training sequence contains a number ... |

182 |
Nonserial dynamic programming
- Bertele, Brioschi
- 1972
(Show Context)
Citation Context ...al models, the labeling problem is the most-probable-configuration problem on the graph and can be solved through max-propagation on junction trees [18], [21], [28]. The dynamic programming algorithm =-=[2]-=- and the max-propagation algorithm essentially have the same order of complexity which is determined by the maximum clique size of the graph. The maximum clique size for a decomposable triangulated gr... |

168 | Visual tracking of high dof articulated structures: An application to human handtracking
- Rehg, Kanade
- 1994
(Show Context)
Citation Context ...ment, and security. To this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], =-=[24]-=-, [4], [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information a... |

119 |
Tracking of persons in monocular image sequences
- Wachter, Nagel
- 1997
(Show Context)
Citation Context ...end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], [16], [5], =-=[31]-=-, while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information and, in principle, tolerate l... |

117 | Learning with mixtures of trees
- Meila, Jordan
(Show Context)
Citation Context ...ble triangulated graph, is such that it allows efficient detection and labeling of the body. Structure learning of graphical models has been previously studied by a number of authors [7], [12], [20], =-=[22]-=-, [13]. The main contribution of this paper, apart from the specifics of the application, is that our method is unsupervised: it is based on unlabeled training data. The training sequence contains a n... |

97 | Detecting activities - Polana, Nelson - 1993 |

78 | Incremental recognition of pedestrians from image sequences
- Rohr
- 1983
(Show Context)
Citation Context ...ertainment, and security. To this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body =-=[25]-=-, [24], [4], [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more informa... |

70 | 3D position, attitude and shape input using video tracking of hands and lips
- Blake, Isard
- 1994
(Show Context)
Citation Context ...and security. To this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], =-=[4]-=-, [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information and, i... |

64 | Towards Detection of Human Motion
- Song, Feng, et al.
- 2000
(Show Context)
Citation Context ...resent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], =-=[26]-=-. Strong models have the advantage of incorporating more information and, in principle, tolerate lower signalto-noise ratios and be allowed to reconstruct 3D body pose and motion from 2D images. Weak ... |

57 | Human tracking with mixtures of trees
- Ioffe, Forsyth
- 2001
(Show Context)
Citation Context ...an provide the most accurate approximation, among all the graphs with less or similar computational cost. Another type of widely used graphs in modeling conditional (in)dependence is trees [7], [22], =-=[17]-=-, whose maximum clique size is two. There exist efficient algorithms [8] to obtain the maximum spanning tree. Therefore, trees have computational advantages over decomposable triangulated graphs. But,... |

54 | Graphical templates for model registration - Amit, Kong - 1996 |

50 |
Monocular Tracking of the Human Arm
- Goncalves, Bernardo, et al.
- 1995
(Show Context)
Citation Context ...ecurity. To this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], =-=[15]-=-, [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information and, in prin... |

46 | Maximum likelihood bounded Tree-Width markov networks - Srebro - 2001 |

30 | Unsupervised learning of models for object recognition
- Weber
- 2000
(Show Context)
Citation Context ...an be present at different locations in the scene. In order to make the Gaussian assumption reasonable, translations need to be removed from the positions. Therefore, we use a local coordinate system =-=[33]-=- for each triangle ðAt;Bt;CtÞ, i.e., we take one body part (for example At) as the origin, and use relative positions for other body parts. More formally, let x denote a vector of positions x ðxAt ;x... |

25 |
Tracking People with Twists and Exponential
- Bregler, Malik
- 1998
(Show Context)
Citation Context ...this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], [16], =-=[5]-=-, [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information and, in principle, tole... |

20 |
Parameterized Modelling and Recognition of Activities”, Computer Vision and Image Understanding
- Yacoob, Black
- 1999
(Show Context)
Citation Context ...” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], [16], [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], =-=[35]-=-, [27], [26]. Strong models have the advantage of incorporating more information and, in principle, tolerate lower signalto-noise ratios and be allowed to reconstruct 3D body pose and motion from 2D i... |

11 |
Learning Bayesian networks from data
- Friedman, Koller
- 2001
(Show Context)
Citation Context ... a decomposable triangulated graph, is such that it allows efficient detection and labeling of the body. Structure learning of graphical models has been previously studied by a number of authors [7], =-=[12]-=-, [20], [22], [13]. The main contribution of this paper, apart from the specifics of the application, is that our method is unsupervised: it is based on unlabeled training data. The training sequence ... |

9 |
Learning Bayesian networks is NP-hard (Technical Report MSR-TR-94-17
- Chickering, Geiger, et al.
- 1994
(Show Context)
Citation Context ...hat the search for the optimal decomposable triangulated graph is equivalent to the search for the optimal graph with treewidth not greater than three. It is proven that the latter problem is NP-hard =-=[6]-=-, [29]. Therefore, the search of the optimal decomposable triangulated graph is NP-hard. 5. Note that X n in Section 3 is different from other sections. Here, X n is a sample from a probability distri... |

7 |
What: A Real Time System for Detecting and Tracking
- Haritaoglu, Harwood, et al.
- 1998
(Show Context)
Citation Context ...y. To this end, a number of models of human motion have been proposed in the literature [14]. “Strong” models represent explicitly the kinematics and dynamics of the human body [25], [24], [4], [15], =-=[16]-=-, [5], [31], while “weak” models represent its phenomenological spatio-temporal appearance [23], [35], [27], [26]. Strong models have the advantage of incorporating more information and, in principle,... |

6 |
Monocular Perception of Biological Motion
- Song, Goncalves, et al.
- 2001
(Show Context)
Citation Context ... t Þ ð31Þ ð32Þ8 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 7, JULY 2003 Fig. 2. Decomposable triangulated models for motion capture data. (a) Hand-constructed model =-=[27]-=-. (b) Model obtained from greedy search (Section 3.2). (c) Decomposable triangulated model grown from a maximum spanning tree [7], [22], [17]. The solid lines are edges from the maximum spanning tree ... |

6 |
A Probabilistic Approach to Human Motion Detection and Labeling
- Song
- 2003
(Show Context)
Citation Context ...ral Graphical Models For general graphical models, the labeling problem is the most-probable-configuration problem on the graph and can be solved through max-propagation on junction trees [18], [21], =-=[28]-=-. The dynamic programming algorithm [2] and the max-propagation algorithm essentially have the same order of complexity which is determined by the maximum clique size of the graph. The maximum clique ... |

1 |
Labeling Human Motion Using Mixtures of Trees,” Univ. of California at Berkeley, personal communication
- Fowlkes
- 2001
(Show Context)
Citation Context ...mposable triangulated graphs. But, decomposable triangulated graphs are more suitable for our application because they have better graph connectivity in dealing with occlusion [28]. With a tree graph =-=[11]-=-, if there is a single occlusion, the detection result may be split into two or more separate components, whereas with a triangulated graph, even if two adjacent parts (vertices) are occluded, the det... |

1 | An Introduction to Graphical Models. year - Jordan |

1 |
EM-Algorithm,” Class Notes at California Inst
- Welling
- 2000
(Show Context)
Citation Context ...rts Observed This section develops an algorithm searching for the best decomposable triangulated model from unlabeled data, which is inspired by the idea of the expectation-maximization (or EM, [10], =-=[34]-=-) algorithm. The algorithm we propose does not guarantee the same convergence properties as EM although it works well in practice. Assume that we have a data set of N samples XfX 1 ; X 2 ; ...; X N g... |