Results 11 - 20
of
59
Mining models of human activities from the web
- in Proceedings of The Thirteenth International World Wide Web Conference (WWW '04
, 2004
"... The ability to determine what day-to-day activity (such as cooking pasta, taking a pill, or watching a video) a person is performing is of interest in many application domains. A system that can do this requires models of the activities of interest, but model construction does not scale well: humans ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The ability to determine what day-to-day activity (such as cooking pasta, taking a pill, or watching a video) a person is performing is of interest in many application domains. A system that can do this requires models of the activities of interest, but model construction does not scale well: humans must specify low-level details, such as segmentation and feature selection of sensor data, and high-level structure, such as spatio-temporal relations between states of the model, for each and every activity. As a result, previous practical activity recognition systems have been content to model a tiny fraction of the thousands of human activities that are potentially useful to detect. In this paper, we present an approach to sensing and modeling activities that provides scalability for a much larger class of activities than before. We show how a new class of sensors, based on Radio
Ontology and Taxonomy Collaborated Framework for Meeting Classification
, 2004
"... A framework for classification of meeting videos is proposed in this paper. Our goal is to utilize this framework to analyze human motion data to perform automatic meeting classification. We use a rule-based system and state machine to analyze the videos, utilize three levels of context hierarchy, n ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
A framework for classification of meeting videos is proposed in this paper. Our goal is to utilize this framework to analyze human motion data to perform automatic meeting classification. We use a rule-based system and state machine to analyze the videos, utilize three levels of context hierarchy, namely movements (and their attributes), events(actions), and behavior to identify the activities and classify the meeting type based on the meeting ontology. We also define a meeting ontology that is determined by the knowledge base of various meeting sequences. This ontology validates and refines the taxonomy based on the hierarchy of events and behaviors, and regroups similar meetings in one category, refining the classes. Ontology is the process of determining the class of a meeting video based on relationships, and taxonomy is the categorization of meetings based on a certain criteria. The rule-based system is the primary framework manager, which recognizes behaviors based on the events detected by the state machine. It also periodically rolls back the state machine from erroneous statespace to a stable state. The state machine detects the events using a sliding temporal window of human movements. Our approach is appropriate for classifying meetings in complex sequences involving various actions and partial occlusion of tracked objects. Our framework is unique and scalable, with the capability to add new meeting types to the framework with little or no modification to the current framework. Using our framework, we are able to correctly classify various meeting sequences such as voting, argument, presentation, and object passing in our experiments. This framework is applicable to automated video surveillance, video segmentation and retrieval (multimedia), human computer in...
Common Sense Based Joint Training of Human Activity Recognizers
- In: Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... Given sensors to detect object use, commonsense priors of object usage in activities can reduce the need for labeled data in learning activity models. It is often useful, however, to understand how an object is being used, i.e., the action performed on it. We show how to add personal sensor da ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Given sensors to detect object use, commonsense priors of object usage in activities can reduce the need for labeled data in learning activity models. It is often useful, however, to understand how an object is being used, i.e., the action performed on it. We show how to add personal sensor data (e.g., accelerometers) to obtain this detail, with little labeling and feature selection overhead. By synchronizing the personal sensor data with object-use data, it is possible to use easily specified commonsense models to minimize labeling overhead.
Content-Based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events
- In proceedings of IEEE Intl. Workshop on Detection and Recognition of Events in Video
, 2001
"... As amounts of publicly available video data grow, the need to query this data efficiently becomes significant. Consequently, content-based retrieval of video data turns out to be a challenging and important problem. In this paper, we address the specific aspect of inferring semantics automatically f ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
As amounts of publicly available video data grow, the need to query this data efficiently becomes significant. Consequently, content-based retrieval of video data turns out to be a challenging and important problem. In this paper, we address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well as advantages of their integrated use.
Towards Automatic Analysis of Social Interaction Patterns . . .
- MIR'04
, 2004
"... In this paper, we propose an ontology-based approach for analyzing social interaction patterns in a nursing home from video. Social interaction patterns are broken into individual activities and behavior events using a multi-level context hierarchy ontology framework. To take advantage of an ontolog ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
In this paper, we propose an ontology-based approach for analyzing social interaction patterns in a nursing home from video. Social interaction patterns are broken into individual activities and behavior events using a multi-level context hierarchy ontology framework. To take advantage of an ontology in representing how social interactions evolve, we design and refine the ontology based on knowledge gained from 80 hours of video recorded in the public spaces of a nursing home. The ontology is implemented using a dynamic Bayesian network to statistically model the multi-level concepts defined in the ontology. We have developed a prototype system to illustrate the proposed concept. Experiment results have demonstrated feasibility of the proposed approach. The objective of this research is to automatically create concise and comprehensive reports of activities and behaviors of patients to support physicians and caregivers in a nursing facility.
Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense
"... The use of large quantities of common sense has long been thought to be critical to the automated understanding of the world. To this end, various groups have collected repositories of common sense in machinereadable form. However, efforts to apply these large bodies of knowledge to enable corr ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The use of large quantities of common sense has long been thought to be critical to the automated understanding of the world. To this end, various groups have collected repositories of common sense in machinereadable form. However, efforts to apply these large bodies of knowledge to enable correspondingly largescale sensor-based understanding of the world have been few. Challenges have included semantic gaps between facts in the repositories and phenomena detected by sensors, fragility of reasoning in the face of noise, incompleteness of repositories, and slowness of reasoning with these large repositories. We show how to address these problems with a combination of novel sensors, probabilistic representation, web-scale information retrieval and approximate reasoning. In particular, we show how to use the 50,000-fact hand-entered OpenMind Indoor Common Sense database to interpret sensor traces of day-to-day activities with 88% accuracy (which is easy) and 32/53% precision/recall (which is not).
Tracking Discontinuous Motion using Bayesian Inference
- In European Conference on Computer Vision
, 2000
"... . Robustly tracking people in visual scenes is an important task for surveillance, human-computer interfaces and visually mediated interaction. Existing attempts at tracking a person's head and hands deal with ambiguity, uncertainty and noise by intrinsically assuming a consistently continuous v ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
. Robustly tracking people in visual scenes is an important task for surveillance, human-computer interfaces and visually mediated interaction. Existing attempts at tracking a person's head and hands deal with ambiguity, uncertainty and noise by intrinsically assuming a consistently continuous visual stream and/or exploiting depth information. We present a method for tracking the head and hands of a human subject from a single view with no constraints on the continuity of motion. Hence the tracker is appropriate for real-time applications in which the availability of visual data is constrained, and motion is discontinuous. Rather than relying on spatio-temporal continuity and complex 3D models of the human body, a Bayesian Belief Network deduces the body part positions by fusing colour, motion and coarse intensity measurements with contextual semantics. 1 Introduction Tracking human body parts and motion is a challenging but essential task for modelling, recognition and i...
ARGMode - Activity Recognition Using Graphical Models
, 2003
"... This paper presents a new framework for tracking and recognizing complex multi-agent activities using probabilistic tracking coupled with graphical models for recognition. We employ statistical feature based particle filter to robustly track multiple objects in cluttered environments. Both color and ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper presents a new framework for tracking and recognizing complex multi-agent activities using probabilistic tracking coupled with graphical models for recognition. We employ statistical feature based particle filter to robustly track multiple objects in cluttered environments. Both color and shape characteristics are used to differentiate and track different objects so that low level visual information can be reliably extracted for recognition of complex activities. Such extracted spatio-temporal features are then used to build temporal graphical models for characterization of these activities. We demonstrate through examples in different scenarios, the generalizability and robustness of our framework.
Probabilistic Motion Parameter Models for Human Activity Recognition
, 2002
"... A novel method for human activity recognition is presented. Given a video sequence containing human activity, the motion parameters of each frame are first computed using different motion parameter models. The likelihood of these observed motion parameters is optimally approximated, based ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A novel method for human activity recognition is presented. Given a video sequence containing human activity, the motion parameters of each frame are first computed using different motion parameter models. The likelihood of these observed motion parameters is optimally approximated, based directly on a multivariate Gaussian probabilistic model. The dynamic change of motion parameter likelihood in a video sequence is characterized using a continuous density hidden Markov model. Activity recognition is then posed as a motion parameter maximum likelihood estimation problem. Experimental results show that the method proposed here works well in recognizing such complex human activities as sitting, getting up from a chair, and some martial art actions. 1.
Working with robots and objects: Revisiting deictic reference for achieving spatial common ground
- In Proceedings of Human-Robot Interaction
, 2006
"... Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on realworld spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on realworld spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise implemented in an ad hoc fashion: the ability to correctly determine the object referent from deictic reference including pointing gestures and speech. We develop a modular spatial reasoning framework based around decomposition and resynthesis of speech and gesture into a language of pointing and object labeling that supports multimodal and unimodal access in both real-world and mixedreality workspaces, accounts for the need to discriminate and sequence identical and proximate objects, assists in overcoming inherent precision limitations in deictic gesture, and assists in the extraction of those gestures. We further discuss an implementation of the framework that has been deployed on two humanoid robot platforms to date. 1.

