Results 1 -
9 of
9
Exploiting Human Actions and Object Context for Recognition Tasks
, 1999
"... Our goal is to exploit human motion and object context to perform action recognition and object classification. Towards this end, we introduce a framework for recognizing actions and objects by measuring image-, object- and action-based information from video. Hidden Markov models are combined with ..."
Abstract
-
Cited by 87 (6 self)
- Add to MetaCart
Our goal is to exploit human motion and object context to perform action recognition and object classification. Towards this end, we introduce a framework for recognizing actions and objects by measuring image-, object- and action-based information from video. Hidden Markov models are combined with object context to classify hand actions, which are aggregated by a Bayesian classifier to summarize activities. We also use Bayesian methods to differentiate the class of unknown objects by evaluating detected actions along with lowlevel, extracted object features. Our approach is appropriate for locating and classifying objects under a variety of conditions including full occlusion. We show experiments where both familiar and previously unseen objects are recognized using action and context information. 1. Introduction This paper proposes a novel approach to human activity recognition that uses context information of particular objects in the scene. We define classes that contain object-s...
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
- Journal of Artificial Intelligence Research
, 2001
"... This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the parti ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. 1.
Specific-to-General Learning for Temporal Events with Application to Learning . . .
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples -- only specific-to-general learning method based on these algorithms. We also present a polynomial-time -- computable "syntactic" subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally
Visual Event Classification via Force Dynamics
, 2000
"... This paper presents an implemented system, called LEONARD, that classifies simple spatial motion events, such as pick up and put down, from video input. Unlike previous systems that classify events based on their motion profile, LEONARD uses changes in the state of force-dynamic relations, suc ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper presents an implemented system, called LEONARD, that classifies simple spatial motion events, such as pick up and put down, from video input. Unlike previous systems that classify events based on their motion profile, LEONARD uses changes in the state of force-dynamic relations, such as support, contact, and attachment, to distinguish between event types. This paper presents an overview of the entire system, along with the details of the algorithm that recovers force-dynamic interpretations using prioritized circumscription and a stability test based on a reduction to linear programming. This paper also presents an example illustrating the end-to-end performance of LEONARD classifying an event from video input. Introduction People can describe what they see. If someone were to pick up a block and ask you what you saw, you could say The person picked up the block. In doing so, you describe both objects, like people and blocks, and events, like pickings up. Most...
Visual Event Perception
- In Proceedings of the NEC Research Symposium
, 1999
"... This paper presents a novel framework for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, tip, and tap and classifying novel observations into previously trained classes. Simple colour- and motionbased segmentati ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents a novel framework for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, tip, and tap and classifying novel observations into previously trained classes. Simple colour- and motionbased segmentation and tracking techniques are used to produce a time series of feature vectors constructed from the 2D object positions, orientations, shapes, and sizes. Hidden Markov models are trained on this time series data and used to classify novel occurrences into previously trained classes. The particular choice of features used allows the system to construct meaningful semantic representations of the event classes that it has learned. KEYWORDS: Event classification, Motion analysis, Segmentation, Tracking, Learning, Hidden Markov models, Lexical semantics 1 Introduction People can describe what they see. If I were to pick up a block and ask you what you saw, you could say Jeff picked up the...
Learning, detection and representation of multi-agent events in videos
, 2007
"... In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to learn the event structure from training videos, w ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to learn the event structure from training videos, we automatically encode the sub-event dependency graph, which is the learnt event model that depicts the conditional dependency between sub-events. Second, we pose the problem of event detection in novel videos as clustering the maximally correlated sub-events using normalized cuts. The principal assumption made in this work is that the events are composed of a highly correlated chain of sub-events that have high weights (association) within the cluster and relatively low weights (disassociation) between the clusters. The event detection does not require prior knowledge of the number of agents event model should extend to representations related to human understanding of events. Therefore, we propose an extension of CASE representation of natural languages that allows a plausible means of interface between users and the computer. We show results of learning, detection, and representation of events for videos in the meeting, surveillance, and railroad monitoring domains.
Vision-Based Recognition of Actions using Context
, 2000
"... In this dissertation, we address the problem of recognizing human interactions with objects from video. Methods for recognizing these activities using human motion and information about objects are developed for practical, real-time systems. We introduce a framework, called ObjectSpaces, that sorts, ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this dissertation, we address the problem of recognizing human interactions with objects from video. Methods for recognizing these activities using human motion and information about objects are developed for practical, real-time systems. We introduce a framework, called ObjectSpaces, that sorts, stores, and manages data acquired using low-level vision techniques into intuitive classes. Our framework decomposes the recognition process into layers, i.e., a low-level layer for routine hand and object tracking and a high-level layer for domain-specific representation of activities. Segmenting recognition tasks and information in this way encourages model reuse and provides the flexibility to use a single framework in a variety of domains. We present several ways of...
Physics-Based Human Motion Modeling for People Tracking: A Short Tutorial
, 2009
"... Physics-based models have proved to be effective in modeling how people move and interact with the environment. Such dynamical models are prevalent in computer graphics and robotics, where they allow physically plausible animation and/or simulation of humanoid motion. Similar models have also proved ..."
Abstract
- Add to MetaCart
Physics-based models have proved to be effective in modeling how people move and interact with the environment. Such dynamical models are prevalent in computer graphics and robotics, where they allow physically plausible animation and/or simulation of humanoid motion. Similar models have also proved useful in biomechanics, allowing clinically meaningful analysis of human motion in terms of muscle and ground reaction forces. In computer vision the use of such models (e.g., as priors for video-based human pose tracking) has been limited. Most prior models in vision, to date, take the form of kinematic priors that can effectively be learned from motion capture data, but are inherently unable to explicitly account for physical plausibility of recovered motions (e.g., consistency with gravity, ground interactions, inertia, etc.). As a result many current methods suffer from visually unpleasant artifacts, (e.g., out of plane rotations, foot skate, etc.), especially when one is limited to monocular observations. Recently, physics-based prior models have been successfully illustrated to address some of these issues. We posit that physics-based prior models are among the next important steps in developing more robust methods to track human motion over time. That said, the models involved are conceptually challenging and carry a high overhead for those unfamiliar with Newtonian mechanics; furthermore good references that address practical issues of importance (particularly as they apply to vision problems) are scarce. This document will cover the motivation for the use of physics-based models for tracking of articulated objects (e.g., people), as well as the formalism required for someone unfamiliar with these models to easily get started. This document is part of the larger set of materials: slides, notes, and Matlab code, that will allow a capable novice to proceed along this innovative research path.
Examiner was Yngve SundbladPLATFORM FOR RAPID-PROTOTYPING OF COMPUTER VISION AND INTERACTION
"... This Masters project has considered development of and experimentations in "unencumbered interaction " through the use of a camera-input. The set of deliverables for the work includes a software library implementing various image processing interaction routines and the results of user interaction wi ..."
Abstract
- Add to MetaCart
This Masters project has considered development of and experimentations in "unencumbered interaction " through the use of a camera-input. The set of deliverables for the work includes a software library implementing various image processing interaction routines and the results of user interaction with the working system. The implemented image processing routines embody the notion user-interaction in a spatial environment by the integration of simple image processing with appropriate and engaging interaction. While most of the image processing techniques are applications of textbook solutions, their inclusion in environments, interaction paradigms, and other settings (virtual and physical) are novel and required investigation and iterative development. The work takes advantage of assumptions about the user and setting, so fundamental vision problems (e.g. foreground/background segmentation, and motion flow direction) fall to the background, while the parameters of user-interaction come in focus. The work was carried out primarily in the Java language, the DIVE virtual environment system, and the KidPad cooperative drawing tool. A java toolbox package, Motion Studio, was developed for the purpose of future ease of prototyping, expanding the current system, and implementing distributed solutions. The work is incorporated into the greater work of two international projects, one building storytelling spaces for children, and the other multi-user information space. The computer vision routines developed form a core set of basic interaction methods applicable in multiple research settings. SNABBTILLVERKNINGS PLATFORM FÖR DATORSEENDE OCH INTERAKTION Sammanfattning

