## A Bayesian computer vision system for modeling human interactions (2000)

### Cached

### Download Links

Venue: | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |

Citations: | 355 - 6 self |

### BibTeX

@ARTICLE{Oliver00abayesian,

author = {Nuria M. Oliver and Barbara Rosario and Alex P. Pentland},

title = {A Bayesian computer vision system for modeling human interactions},

journal = {IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE},

year = {2000},

volume = {22},

number = {8},

pages = {831--843}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task [1]. The system is particularly concerned with detecting when interactions between people occur and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both components employing a statistical Bayesian approach [2]. We propose and compare two different state-based learning architectures, namely, HMMs and CHMMs for modeling behaviors and interactions. The CHMM model is shown to work much more efficiently and accurately. Finally, to deal with the problem of limited training data, a synthetic ªAlife-styleº training system is used to develop flexible prior models for recognizing human interactions. We demonstrate the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.

### Citations

4273 | A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...and evidence from data, believing that the Bayesian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models [6], such as Hidden Markov Models (HMMs) =-=[22]-=- and Coupled Hidden Markov Models (CHMMs) [5, 4, 19], seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, ... |

1123 | Pfinder: Real-time tracking of the human body
- Wren, Azarbayejani
- 1997
(Show Context)
Citation Context ...etect and track the pedestrians in the scene. We use 2-D blob features for modeling each pedestrian. The notion of "blobs" as a representation for image features has a long history in comput=-=er vision [20, 15, 2, 26, 18]-=-, and has had many different mathematical Visual Evidence BOTTOM-UP Pedestrian Detection and Tracking Focus of Attention Model Expectation Expectation Image Tracking Interaction Detection and Recognit... |

854 | A tutorial on learning with bayesian networks
- Heckerman
- 1995
(Show Context)
Citation Context ...agent's instantaneous behavior is compact, there is still the problem of managing all this information over time. Statistical directed acyclic graphs (DAGs) or probabilistic inference networks (PINs) =-=[7, 13]-=- can provide a computationally efficient solution to these problems. HMMs and their extensions, such as CHMMs, can be viewed as a particular, simple case of temporal PIN or DAG. PINs consist of a set ... |

489 | Factorial Hidden Markov Models
- Ghahramani, Jordan
- 1998
(Show Context)
Citation Context ...e than one simultaneous state variable. It is well known that the exact solution of extensions of the basic HMM to 3 or more chains is intractable. In those cases approximation techniques are needed (=-=[23, 12, 24, 25]-=-). However, it is also known that there exists an exact solution for the case of 2 interacting chains, as it is our case [23, 4]. We therefore use two Coupled Hidden Markov Models (CHMMs) for modeling... |

369 | Coupled Hidden Markov Models for complex action recognition
- Brand, Oliver, et al.
- 1997
(Show Context)
Citation Context ...esian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models [6], such as Hidden Markov Models (HMMs) [22] and Coupled Hidden Markov Models (CHMMs) =-=[5, 4, 19]-=-, seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (... |

247 | Operations for learning with graphical models
- Buntine
- 1994
(Show Context)
Citation Context ...eling that includes both prior knowledge and evidence from data, believing that the Bayesian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models =-=[6]-=-, such as Hidden Markov Models (HMMs) [22] and Coupled Hidden Markov Models (CHMMs) [5, 4, 19], seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warpi... |

228 |
A Tutorial on Hidden Markov Models and Selected
- Rabiner
- 1989
(Show Context)
Citation Context ...nd evidence from data, believing that the Bayesian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models [11], such as Hidden Markov Models (HMMs) =-=[12]-=- and Coupled Hidden Markov Models (CHMMs) [13], [14], [15], seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algor... |

208 | Probabilistic visual learning for object detection. Paper presented at the The
- Moghaddam, Pentland
- 1995
(Show Context)
Citation Context ...round images B i = \Phi M b X i to model the static parts of the scene, pertaining to the background. Therefore, by computing and thresholding the Euclidean distance (distance from feature space DFFS =-=[16]-=-) between the input image and the projected image we can detect the moving objects present in the scene: D i = jI i \Gamma B i j ? t, where t is a given threshold. Note that it is easy to adaptively p... |

172 | A guide to the literature on learning probabilistic networks from data
- Buntine
- 1996
(Show Context)
Citation Context ...agent's instantaneous behavior is compact, there is still the problem of managing all this information over time. Statistical directed acyclic graphs (DAGs) or probabilistic inference networks (PINs) =-=[7, 13]-=- can provide a computationally efficient solution to these problems. HMMs and their extensions, such as CHMMs, can be viewed as a particular, simple case of temporal PIN or DAG. PINs consist of a set ... |

167 | Probabilistic independence networks for hidden markov probability models
- Smyth, Heckerman, et al.
- 1997
(Show Context)
Citation Context ...e than one simultaneous state variable. It is well known that the exact solution of extensions of the basic HMM to 3 or more chains is intractable. In those cases approximation techniques are needed (=-=[23, 12, 24, 25]-=-). However, it is also known that there exists an exact solution for the case of 2 interacting chains, as it is our case [23, 4]. We therefore use two Coupled Hidden Markov Models (CHMMs) for modeling... |

144 |
Bayesian updating in recursive graphical models by local computation
- Jensen, Lauritzen, et al.
- 1990
(Show Context)
Citation Context ...ling causal (temporal) influences between their hidden state variables. The graphical representation of CHMMs is shown in Fig. 4. Exact maximum a posteriori (MAP) inference is an O…TN 4 † computation =-=[34]-=-, [30]. We have developed a deterministic O…TN 2 † algorithm for maximum entropy approximations to state and parameter values in CHMMs. From the graph it can be seen that for each chain, the state at ... |

69 |
From image sequences towards conceptual descriptions
- Nagel
- 1988
(Show Context)
Citation Context ... interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],[21], [8], =-=[17]-=-, [14],[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level interpretati... |

68 |
Active Perception vs. Passive Perception
- Bajcsy
- 1985
(Show Context)
Citation Context ... of the CHMM formulation is presented in the appendix. 2 System Overview Our system employs a static camera with wide field-of-view watching a dynamic outdoor scene (the extension to an active camera =-=[1]-=- is straightforward and planned for the next version). A real-time computer vision system segments moving objects from the learned scene. The scene description method allows variations in lighting, we... |

67 | Automated symbolic traffic scene analysis using belief networks
- Huang, Koller, et al.
- 1994
(Show Context)
Citation Context ...action. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],[21], [8], [17], =-=[14]-=-,[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level interpretation mod... |

61 | Couple hidden Markov models for modeling interacting processes
- Brand
- 1997
(Show Context)
Citation Context ...esian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models [6], such as Hidden Markov Models (HMMs) [22] and Coupled Hidden Markov Models (CHMMs) =-=[5, 4, 19]-=-, seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (... |

60 | Modeling and prediction of human behavior
- Pentland, Liu
- 1999
(Show Context)
Citation Context ...the type of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],=-=[21]-=-, [8], [17], [14],[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level i... |

52 | Boltzmann chains and hidden Markov models
- Saul, I
- 1995
(Show Context)
Citation Context ...e than one simultaneous state variable. It is well known that the exact solution of extensions of the basic HMM to 3 or more chains is intractable. In those cases approximation techniques are needed (=-=[23, 12, 24, 25]-=-). However, it is also known that there exists an exact solution for the case of 2 interacting chains, as it is our case [23, 4]. We therefore use two Coupled Hidden Markov Models (CHMMs) for modeling... |

50 | ªAdvanced Visual Surveillance Using Bayesian Networks,º Int'l Conf. Computer Vision
- Buxton, Gong
- 1995
(Show Context)
Citation Context ...pe of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],[21], =-=[8]-=-, [17], [14],[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level interp... |

34 | Active gesture recognition using partially observable markov decision processes
- Darrell, Pentland
- 1996
(Show Context)
Citation Context ...ssifying the type of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video (=-=[10]-=-,[3],[21], [8], [17], [14],[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a highe... |

31 | What is going on? A High-Level Interpretation of a Sequence
- Castel, Chaudron, et al.
- 1996
(Show Context)
Citation Context ...n. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],[21], [8], [17], [14],=-=[9]-=-, [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level interpretation module ... |

22 |
ªThe Representation Space Paradigm of Concurrent Evolving Object Descriptions,º
- Bobick, Bolles
- 1992
(Show Context)
Citation Context ...etect and track the pedestrians in the scene. We use 2-D blob features for modeling each pedestrian. The notion of "blobs" as a representation for image features has a long history in comput=-=er vision [20, 15, 2, 26, 18]-=-, and has had many different mathematical Visual Evidence BOTTOM-UP Pedestrian Detection and Tracking Focus of Attention Model Expectation Expectation Image Tracking Interaction Detection and Recognit... |

21 | Computers seeing action
- Bobick
- 1997
(Show Context)
Citation Context ...ing the type of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],=-=[3]-=-,[21], [8], [17], [14],[9], [11]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher le... |

18 | D.: Building qualitative event models automatically from visual input. In: ICCV’98
- Fernyhough, Cohn, et al.
- 1998
(Show Context)
Citation Context ...er the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([10],[3],[21], [8], [17], [14],[9], =-=[11]-=-). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving object --- human or car, for example ---, and a higher level interpretation module that c... |

16 |
An Unsupervised Clustering Approach to Spatial Pre-processing
- Kauth, Pentland, et al.
- 1977
(Show Context)
Citation Context ...etect and track the pedestrians in the scene. We use 2-D blob features for modeling each pedestrian. The notion of "blobs" as a representation for image features has a long history in comput=-=er vision [20, 15, 2, 26, 18]-=-, and has had many different mathematical Visual Evidence BOTTOM-UP Pedestrian Detection and Tracking Focus of Attention Model Expectation Expectation Image Tracking Interaction Detection and Recognit... |

16 |
Mean field networks that learn to discriminate temporally distorted strings
- Williams, Hinton
- 1990
(Show Context)
Citation Context |

13 | Graphical models for recognizing human interactions
- Oliver, Rosario, et al.
- 1998
(Show Context)
Citation Context ...esian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models [6], such as Hidden Markov Models (HMMs) [22] and Coupled Hidden Markov Models (CHMMs) =-=[5, 4, 19]-=-, seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (... |

12 | ªTowards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors,º
- Oliver
- 2000
(Show Context)
Citation Context ...n, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both components employing a statistical Bayesian approach =-=[2]-=-. We propose and compare two different state-based learning architectures, namely, HMMs and CHMMs for modeling behaviors and interactions. The CHMM model is shown to work much more efficiently and acc... |

11 |
Classification by clustering
- Pentland
- 1976
(Show Context)
Citation Context |

9 |
Lafter: Lips and face tracking
- Oliver, Berard, et al.
- 1997
(Show Context)
Citation Context |

8 |
ªPfinder: Real-Time Tracking
- Wren, Azarbayejani, et al.
- 1995
(Show Context)
Citation Context ...he pedestrians in the scene. We use 2D blob features for modeling each pedestrian. The notion of ªblobsº as a representation for image features has a long history in computer vision [19], [20], [21], =-=[22]-=-, [23] and has had many different mathematical definitions. In our usage, it is a compact set of pixels that share some visual properties that are not shared by the surrounding pixels. These propertie... |

6 |
ªPfinder: Real-time Tracking of the Human Body,º
- Wren, Azarbayejani, et al.
- 1997
(Show Context)
Citation Context ...that produces blob descriptions that characterize each person's shape. We have also experimented with modeling the background by using a mixture of Gaussian distributions at each pixel, as in Pfinder =-=[25]-=-. However, we finally opted for the eigenbackground method because it offered good results and less computational load. 3.2 Tracking The trajectories of each blob are computed and saved into a dynamic... |

4 |
ªProbabilistic Visual Learning for Object Detection,º
- Moghaddam, Pentland
- 1995
(Show Context)
Citation Context ...igenbackground images Bi ˆ MbXi to model the static parts of the scene, pertaining to the background. Therefore, by computing and thresholding the Euclidean distance (distance from feature space DFFS =-=[24]-=-) between the input image and the projected image, we can detect the moving objects present in the scene: Di ˆjIi Bij >t, where t is a given threshold. Note that it is easy to adaptively perform the e... |

2 |
ªA Synthetic Agent System for Modeling Human Interactions,º
- Rosario, Oliver, et al.
- 1999
(Show Context)
Citation Context ...Fig. 1. Top-down and bottom-up processing loop. To specify the priors in our system, we have developed a framework for building and training models of the behaviors of interest using synthetic agents =-=[16]-=-, [17]. Simulation with the agents yields synthetic data that is used to train prior models. These prior models are then used recursively in a Bayesian framework to fit real behavioral data. This appr... |

2 |
ªSpeechreading: An Overview of Image
- Stork, Hennecke
- 1996
(Show Context)
Citation Context ...mple is the sensor fusion problem: Multiple channels carry complementary information about different components of a system, e.g., acoustical signals from speech and visual features from lip tracking =-=[32]-=-. In [29], a generalization of HMMs with coupling at the outputs is presented. These are Factorial HMMs (FHMMs) where the state variable is factored into multiple state variables. They have a clear re... |

2 |
ªBlob: An Unsupervised Clustering Approach to Spatial Preprocessing of MSS Imagery,º 11th Int'l Symp. Remote Sensing of the Environment
- Kauth, Pentland, et al.
- 1977
(Show Context)
Citation Context ... and track the pedestrians in the scene. We use 2D blob features for modeling each pedestrian. The notion of ªblobsº as a representation for image features has a long history in computer vision [19], =-=[20]-=-, [21], [22], [23] and has had many different mathematical definitions. In our usage, it is a compact set of pixels that share some visual properties that are not shared by the surrounding pixels. The... |

1 |
ªComputers Seeing Action,º
- Bobick
- 1996
(Show Context)
Citation Context ...ing the type of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([3], =-=[4]-=-, [5], [6], [7], [8], [9], [10]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving objectÐhuman or car, for exampleÐand a higher level interp... |

1 |
ªModeling and Prediction of Human Behavior,º Defense
- Pentland, Liu
- 1997
(Show Context)
Citation Context ...he type of interaction. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([3], [4], =-=[5]-=-, [6], [7], [8], [9], [10]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving objectÐhuman or car, for exampleÐand a higher level interpretat... |

1 |
ªWhat is Going On? A High Level Interpretation of
- Castel, Chaudron, et al.
- 1996
(Show Context)
Citation Context ...on. Over the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([3], [4], [5], [6], [7], [8], =-=[9]-=-, [10]). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving objectÐhuman or car, for exampleÐand a higher level interpretation module that clas... |

1 |
ªBuilding Qualitative Event Models Automatically from
- Fernyhough, Cohn, et al.
- 1998
(Show Context)
Citation Context ...ver the last decade there has been growing interest within the computer vision and machine learning communities in the problem of analyzing human behavior in video ([3], [4], [5], [6], [7], [8], [9], =-=[10]-=-). Such systems typically consist of a low- or mid-level computer vision system to detect and segment a moving objectÐhuman or car, for exampleÐand a higher level interpretation module that classifies... |

1 |
ªOperations for Learning with Graphical Models,º
- Buntine
- 1994
(Show Context)
Citation Context ...eling that includes both prior knowledge and evidence from data, believing that the Bayesian approach provides the best framework for coping with small data sets and novel behaviors. Graphical models =-=[11]-=-, such as Hidden Markov Models (HMMs) [12] and Coupled Hidden Markov Models (CHMMs) [13], [14], [15], seem most appropriate for modeling and classifying human behaviors because they offer dynamic time... |

1 |
ªGraphical Models for Recognizing Human
- Oliver, Rosario, et al.
- 1998
(Show Context)
Citation Context ...h provides the best framework for coping with small data sets and novel behaviors. Graphical models [11], such as Hidden Markov Models (HMMs) [12] and Coupled Hidden Markov Models (CHMMs) [13], [14], =-=[15]-=-, seem most appropriate for modeling and classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (... |

1 |
ªActive Perception vs. Passive Perception,º
- Bajcsy
- 1985
(Show Context)
Citation Context ... of the CHMM formulation is presented in the Appendix. 2 SYSTEM OVERVIEW Our system employs a static camera with wide field-of-view watching a dynamic outdoor scene (the extension to an active camera =-=[18]-=- is straightforward and planned for the next version). A real-time computer vision system segments moving objects from the learned scene. The scene description method allows variations in lighting, we... |

1 |
ªClassification by Clustering,º
- Pentland
- 1976
(Show Context)
Citation Context ...detect and track the pedestrians in the scene. We use 2D blob features for modeling each pedestrian. The notion of ªblobsº as a representation for image features has a long history in computer vision =-=[19]-=-, [20], [21], [22], [23] and has had many different mathematical definitions. In our usage, it is a compact set of pixels that share some visual properties that are not shared by the surrounding pixel... |

1 |
ªLafter: Lips and Face Tracking,º
- Oliver, BeÂrard, et al.
- 1997
(Show Context)
Citation Context ...estrians in the scene. We use 2D blob features for modeling each pedestrian. The notion of ªblobsº as a representation for image features has a long history in computer vision [19], [20], [21], [22], =-=[23]-=- and has had many different mathematical definitions. In our usage, it is a compact set of pixels that share some visual properties that are not shared by the surrounding pixels. These properties coul... |

1 |
ªHidden Markov Decision Trees,º
- Jordan, Ghahramani, et al.
- 1996
(Show Context)
Citation Context ...oubly connected nodes are integrated out. A limited class of graphs can be recursively decimated, obtaining correlations for any connected pair of nodes. Finally, Hidden Markov Decision Trees (HMDTs) =-=[33]-=-areadecisiontreewithMarkovtemporalstructure (see Fig. 5). The model is intractable for exact calculations. Thus, the authors use variational approximations. They consider three distributions for the a... |

1 | and Shaogang Gong, "Advanced visual surveillance using bayesian networks - Buxton - 1995 |