Results 1 - 10
of
59
Contextual Priming for Object Detection
- IJCV
, 2003
"... There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework f ..."
Abstract
-
Cited by 132 (16 self)
- Add to MetaCart
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Detecting Unusual Activity in Video
, 2004
"... We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from whic ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype--segment co-occurrence matrix is computed. Motivated by a similar problem in documentkeyword analysis, we seek a correspondence relationship between prototypes and video segments which satisfies the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space. We prove that an efficient, globally optimal algorithm exists for the co-embedding problem. Experiments on various real-life videos have validated our approach.
Simultaneous Tracking & Activity Recognition (STAR) Using Many Anonymous, Binary Sensors
, 2004
"... Automatic health monitoring helps enable independent living for the elderly by providing specific information to caregivers. This goal, called aging in place,is increasingly important as an unprecedented portion of the population enters old age. I introduce the simultaneous tracking and activity rec ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Automatic health monitoring helps enable independent living for the elderly by providing specific information to caregivers. This goal, called aging in place,is increasingly important as an unprecedented portion of the population enters old age. I introduce the simultaneous tracking and activity recognition (STAR) problem,whose solution provides this key information. I propose using data from many minimally invasive sensors commonly found in home security systems to provide simultaneous room-level tracking and recognition of many of the activities of daily living (ADLs). ADLs have been chosen by physicians to gauge the severity of cognitive and physical ailments. I describe a Rao-Blackwellised particle filter for room level tracking, rudimentary activity recognition, and data association, as well as a Monte Carlo EM approach for online parameter learning. I demonstrate results from experiments in an instrumented home and on simulated data. Proposed extensions improve the approach and add more complex activity recognition. We discuss how to integrate a growing vocabulary of activities into the tracker.
Increasing the Opportunities for Aging in Place
, 2000
"... A growing social problem in the U.S. and elsewhere is supporting older adults who want to continue living independently as opposed to moving to an institutional care setting. The "Aging in Place" project strives to delay taking that first step away from the family home. Through the careful placement ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
A growing social problem in the U.S. and elsewhere is supporting older adults who want to continue living independently as opposed to moving to an institutional care setting. The "Aging in Place" project strives to delay taking that first step away from the family home. Through the careful placement of technological support we believe older adults can continue living in their own homes longer.
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
Recognizing Multitasked Activities from Video Using Stochastic Context-Free Grammar
- In Proc. AAAI National Conf. on AI
, 2002
"... In this paper, we present techniques for recognizing complex, multitasked activities from video. Visual information like image features and motion appearances, combined with domain-specific information, like object context is used initially to label events. Each action event is represented with ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
In this paper, we present techniques for recognizing complex, multitasked activities from video. Visual information like image features and motion appearances, combined with domain-specific information, like object context is used initially to label events. Each action event is represented with a unique symbol, allowing for a sequence of interactions to be described as an ordered symbolic string. Then, a model of stochastic context-free grammar (SCFG), which is developed using underlying rules of an activity, is used to provide the structure for recognizing semantically meaningful behavior over extended periods. Symbolic strings are parsed using the Earley-Stolcke algorithm to determine the most likely semantic derivation for recognition. Parsing substrings allows us to recognize patterns that describe high-level, complex events taking place over segments of the video sequence.
Expectation Grammars: Leveraging High-Level Expectations for Activity
- in Workshop on Event Mining, Event Detection, and Recognition in Video, held in Conjunction with Computer Vision and Pattern Recognition
, 2003
"... Video-based recognition and prediction of a temporally extended activity can benefit from a detailed description of high-level expectations about the activity. Stochastic grammars allow for an efficient representation of such expectations and are well-suited for the specification of temporally well- ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Video-based recognition and prediction of a temporally extended activity can benefit from a detailed description of high-level expectations about the activity. Stochastic grammars allow for an efficient representation of such expectations and are well-suited for the specification of temporally well-ordered activities. In this paper, we extend stochastic grammars by adding event parameters, state checks, and sensitivity to an internal scene model. We present an implemented system that uses human-specified grammars to recognize a person performing the Towers of Hanoi task from a video sequence by analyzing object interaction events. Experimental results from several videos show robust recognition of the full task and its constituent sub-tasks even though no appearance models of the objects in the video are provided. These experiments include videos of the task performed with different shaped objects and with distracting and extraneous interactions.
A scalable approach to activity recognition based on object use
- In Proceedings of the International Conference on Computer Vision (ICCV), Rio de
, 2007
"... We propose an approach to activity recognition based on detecting and analyzing the sequence of objects that are being manipulated by the user. In domains such as cooking, where many activities involve similar actions, object-use information can be a valuable cue. In order for this approach to scale ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We propose an approach to activity recognition based on detecting and analyzing the sequence of objects that are being manipulated by the user. In domains such as cooking, where many activities involve similar actions, object-use information can be a valuable cue. In order for this approach to scale to many activities and objects, however, it is necessary to minimize the amount of human-labeled data that is required for modeling. We describe a method for automatically acquiring object models from video without any explicit human supervision. Our approach leverages sparse and noisy readings from RFID tagged objects, along with common-sense knowledge about which objects are likely to be used during a given activity, to bootstrap the learning process. We present a dynamic Bayesian network model which combines RFID and video data to jointly infer the most likely activity and object labels. We demonstrate that our approach can achieve activity recognition rates of more than 80 % on a real-world dataset consisting of 16 household activities involving 33 objects with significant background clutter. We show that the combination of visual object recognition with RFID data is significantly more effective than the RFID sensor alone. Our work demonstrates that it is possible to automatically learn object models from video of household activities and employ these models for activity recognition, without requiring any explicit human labeling. 1.
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
"... Abstract—Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of thes ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Abstract—Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human-object interactions. Previous approaches to object and action recognition rely on static shape/appearance feature matching and motion analysis, respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information. Index Terms—Action recognition, object recognition, functional recognition. Ç 1
Recognizing Multitasked Activities using Stochastic Context-Free Grammar
- In Proceedings of AAAI Conference
, 2001
"... In this paper, we present techniques for characterizing complex, multi-tasked activities that require both exemplars and models. Exemplars are used to represent object context, image features, and motion appearances to label domainspecific events. Then, by representing each event with a unique symbo ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this paper, we present techniques for characterizing complex, multi-tasked activities that require both exemplars and models. Exemplars are used to represent object context, image features, and motion appearances to label domainspecific events. Then, by representing each event with a unique symbol, a sequence of interactions can be described as an ordered symbolic string. A model of stochastic contextfree grammar, which is developed using underlying rules of an activity, provides the structure for recognizing semantically meaningful behavior over extended periods. Symbolic strings are parsed using the Earley-Stolcke algorithm to determine the most likely semantic derivation for recognition. Parsing substrings allows us to recognize patterns that describe high-level, complex events taking place over segments of the video sequence. We introduce new parsing strategies to enable error detection and recovery in stochastic context-free grammar and methods of quantifying group and individual behavior in activities with separable roles. We show through experiments with a popular card game how high-level narratives of multi-player games as well as identification of player strategies and behavior can be extracted in real-time using vision.

