Results 1 -
7 of
7
Coupled hidden Markov models for complex action recognition
, 1996
"... We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and ..."
Abstract
-
Cited by 283 (16 self)
- Add to MetaCart
We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However, the Markovian framework makes strong restrictive assumptions about the system generating the signal---that it is a single process having a small number of states and an extremely limited state memory. The single-process model is often inappropriate for vision (and speech) applications, resulting in low ceilings on model performance. Coupled HMMs provide an efficient way to resolve many of these problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions. 1. Introduction Computer vision is turning to problems...
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
- Journal of Artificial Intelligence Research
, 2001
"... This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the parti ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. 1.
Specific-to-General Learning for Temporal Events with Application to Learning . . .
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples -- only specific-to-general learning method based on these algorithms. We also present a polynomial-time -- computable "syntactic" subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally
Learning concise models of human activity from ambient video via a structure-inducing M-step estimator
, 1997
"... We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton -- specifically, a continuous-output hidden Markov model (HMM) -- but ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton -- specifically, a continuous-output hidden Markov model (HMM) -- but the induction method applies generally to any conditional probability model. The learning algorithm introduces and exploits an entropic prior for fast, simultaneous estimation of model structure and parameters. Although not motivated as such, the prior and its maximum a posteriori (MAP) estimator can be understood as an exact formulation of minimum description length (MDL) for Bayesian point estimation; we present an exact solution for the MAP estimator which thus folds MDL into the M-step of expectation-maximization (EM) algorithms. Consequently there is no speculative or wasted computation as in search-based MDL approaches. In contrast to conventionally trained HMMs, entropically trained mod...
Visualizing Competitive Behaviors in Multi-User Virtual Environments
- In Proc. Viz’04
, 2004
"... Figure 1: In first-person games, observation modes are typically restricted to an over-the-shoulder chase camera (left) or a floatingplayer view (center). Both views make it very difficult to understand complex team-oriented actions that have an inherent global nature. We present a novel game observ ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Figure 1: In first-person games, observation modes are typically restricted to an over-the-shoulder chase camera (left) or a floatingplayer view (center). Both views make it very difficult to understand complex team-oriented actions that have an inherent global nature. We present a novel game observation system (right) that extracts high-level semantic information about the action taking place in a game and displays it visually. By emphasizing important low-level details and overlaying them with high level action summaries, we provide a unique and insightful new view of the environment and behaviors therein. Using our system, it can now be seen that the red team is holding the bridge at the center of the map against a frontal assault by blue, but is also being flanked from the North by a lone blue player. We present a system for enhancing observation of user interactions in virtual environments. In particular, we focus on analyzing behavior patterns in the popular team-based first-person perspective game Return to Castle Wolfenstein: Enemy Territory. This game belongs to a genre characterized by two moderate-sized teams (usually 6 to 12 players each) competing over a set of objectives. Our system allows spectators to visualize global features such as
Deleted interpolation using a hierarchical bayesian grammar network for recognizing human activity
- in Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Oct 2005
"... From the viewpoint of an intelligent video surveillance system, the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
From the viewpoint of an intelligent video surveillance system, the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that activities are hierarchical, temporally constrained and temporally overlapped. While stochastic grammars and graphical models have been widely used for the recognition of human activity, methods combining hierarchy and complex queries have been limited. We propose a new method of merging and implementing the advantages of both approaches to recognize activities in real-time. To address the hierarchical nature of human activity recognition, we implement a hierarchical Bayesian network (HBN) based on a stochastic context-free grammar (SCFG). The HBN is applied to digressive substrings of the current string of evidence via deleted interpolation (DI) to calculate the probability distribution of overlapped activities in the current string. Preliminary results from the analysis of activity sequences from a video surveillance camera show the validity of our approach. 1
Action Summary for Computer Games:
- Proc. of 2nd Int’l Conf. Application and Development of Computer Games
, 2003
"... As action in computer games is becoming more and more complex, the possibilities for entertaining players and spectators is increasing. This paper introduces methods for the automatic extraction and evaluation of action scenes in com- puter games. Selection strategies are also presented that allow f ..."
Abstract
- Add to MetaCart
As action in computer games is becoming more and more complex, the possibilities for entertaining players and spectators is increasing. This paper introduces methods for the automatic extraction and evaluation of action scenes in com- puter games. Selection strategies are also presented that allow for the automatic generation of summaries after the game (to be shown as a sequence of images), or to provide timing information to a camera for live spectator mode viewing of the action.

