Results 1 - 10
of
26
Gesture Recognition
"... Introduction A primary goal of virtual environments is to support natural, efficient, powerful, and flexible interaction. If the interaction technology is overly obtrusive, awkward, or constraining, the user's experience with the synthetic environment is severely degraded. If the interaction itself ..."
Abstract
-
Cited by 2223 (28 self)
- Add to MetaCart
Introduction A primary goal of virtual environments is to support natural, efficient, powerful, and flexible interaction. If the interaction technology is overly obtrusive, awkward, or constraining, the user's experience with the synthetic environment is severely degraded. If the interaction itself draws attention to the technology, rather than the task at hand, or imposes a high cognitive load on the user, it becomes a burden and an obstacle to a successful virtual environment experience. The traditional two-dimensional, keyboard- and mouse-oriented graphical user interface (GUI) is not well-suited for virtual environments. Instead, synthetic environments provide the opportunity to utilize several different sensing modalities and technologies and integrate them into the user experience. Devices which sense body position and orientation, direction of gaze, speech and sound, facial expression, galvanic skin response, and other aspects of human behavior or state can be used to mediate c
The Recognition of Human Movement Using Temporal Templates
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... ras) moving? but, rather What is happening? Unfortunately, this new labeling problem is not as welldefined as the previously addressed questions of geometry. Bobick [6] considers the range of motion interpretation problems and proposes a taxonomy of approaches. At the top and intermediate levelsact ..."
Abstract
-
Cited by 304 (5 self)
- Add to MetaCart
ras) moving? but, rather What is happening? Unfortunately, this new labeling problem is not as welldefined as the previously addressed questions of geometry. Bobick [6] considers the range of motion interpretation problems and proposes a taxonomy of approaches. At the top and intermediate levelsaction and activity, respectively are situations in which knowledge other than the immediate motion is required to generate the appropriate label. The most primitive level, however, is movementa motion whose execution is consistent and easily characterized by a definite space-time trajectory in some feature space. Such consistency of execution implies that for a given viewing condition there is consistency of appearance. Put simply, movements can be described by their appearance. This paper presents a novel, appearance-based approach to the recognition of human movement. Our work stands in contrast to many recent efforts to recover the full threedimensional reconstruction of th
Recognition of visual activities and interactions by stochastic parsing
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard inde ..."
Abstract
-
Cited by 170 (5 self)
- Add to MetaCart
This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. To achieve such a system we: 1) provide techniques for generating a discrete symbol stream from continuous low-level detectors; 2) extend stochastic context-free parsing to handle uncertainty in the input symbol stream; 3) augment a run-time parsing algorithm to enforce intersymbol constraints such as requiring temporal consistency between primitives; and 4) extend the consistency filtering to maintain consistent multiobject interactions. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple, interacting objects.
Human Computing and Machine Understanding of Human Behavior: A Survey
- SURVEY, PROC. ACM INT’L CONF. MULTIMODAL INTERFACES
, 2006
"... A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should b ..."
Abstract
-
Cited by 54 (25 self)
- Add to MetaCart
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior.
Recognizing Planned, Multiperson Action
- Computer Vision and Image Understanding
, 2001
"... This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goal-based primitives and low-order temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in mo ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goal-based primitives and low-order temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in model-based object recognition and probabilistic plan recognition, makes four principal assumptions: (1) the goals of individual agents are natural atomic representational units for specifying the temporal relationships between agents engaged in group activities, (2) a high-level description of temporal structure of the action using a small set of low-order temporal and logical constraints is adequate for representing the relationships between the agent goals for highly structured, multiagent action recognition, (3) Bayesian networks provide a suitable mechanism for integrating multiple sources of uncertain visual perceptual feature evidence, and (4) an automatically generated Bayesian
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
Toward scalable activity recognition for sensor networks
- In Lecture Notes in Computer Science
, 2006
"... Sensor networks hold the promise of truly intelligent buildings: buildings that adapt to the behavior of their occupants to improve productivity, efficiency, safety, and security. To be practical, such a network must be economical to manufacture, install and maintain. Similarly, the methodology must ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Sensor networks hold the promise of truly intelligent buildings: buildings that adapt to the behavior of their occupants to improve productivity, efficiency, safety, and security. To be practical, such a network must be economical to manufacture, install and maintain. Similarly, the methodology must be efficient and must scale well to very large spaces. Finally, be be widely acceptable, it must be inherently privacy-sensitive. We propose to address these requirements by employing networks of passive infrared (PIR) motion detectors. PIR sensors are inexpensive, reliable, and require very little bandwidth. They also protect privacy since they are neither capable of directly identifying individuals nor of capturing identifiable imagery or audio. However, with an appropriate analysis methodology, we show that they are capable of providing useful contextual information. The methodology we propose supports scalability by adopting a hierarchical framework that splits computation into localized, distributed tasks. To support our methodology we provide theoretical justification for the method that grounds it in the action recognition literature. We also present quantitative results on a dataset that we have recorded from a 400 square meter wing of our laboratory. Specifically, we report quantitative results that show better than 90 % recognition performance for low-level activities such as walking, loitering, and turning. We also present experimental results for mid-level activities such as visiting and meeting.
Aware Community Portals: Shared Information Appliances for . . .
- PERSONAL AND UBIQUITOUS COMPUTING
, 2001
"... People wish to maintain a level of awareness of timely information, including presence of others in the workplace and other social settings. We believe this provides better exchange, coordination and contact within a community, especially as people work in asynchronous times and distributed location ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
People wish to maintain a level of awareness of timely information, including presence of others in the workplace and other social settings. We believe this provides better exchange, coordination and contact within a community, especially as people work in asynchronous times and distributed locations. The challenge is to develop lightweight techniques for awareness, interaction and communication using 'shared information appliances'. In this paper, we describe the design of an exploratory responsive display projected within a shared workspace at the MIT Media Lab. The system uses visual sensing to provide relevant information and constructs traces of people's activity over time. Such 'aware portals' may be deployed in casual workplace domains, distributed workgroups, and everyday public spaces.
Integrating Perceptual and Cognitive Modeling for Adaptive and Intelligent Human-Computer Interaction
- PROC. OF THE IEEE
, 2002
"... This paper describes technology and tools for intelligent human-computer interaction (IHCI) where human cognitive, perceptual, motor, and affective factors are modeled and used to adapt the H--C interface. IHCI emphasizes that human behavior encompasses both apparent human behavior and the hidden me ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper describes technology and tools for intelligent human-computer interaction (IHCI) where human cognitive, perceptual, motor, and affective factors are modeled and used to adapt the H--C interface. IHCI emphasizes that human behavior encompasses both apparent human behavior and the hidden mental state behind behavioral performance. IHCI expands on the interpretation of human activities, known as W4 (what, where, when, who). While W4 only addresses the apparent perceptual aspect of human behavior, the W5+ technology for IHCI described in this paper addresses also the why and how questions, whose solution requires recognizing specific cognitive states. IHCI integrates parsing and interpretation of nonverbal information with a computational cognitive model of the user, which, in turn, feeds into processes that adapt the interface to enhance operator performance and provide for rational decision-making. The technology proposed is based on a general four-stage interactive framework, which moves from parsing the raw sensory-motor input, to interpreting the user's motions and emotions, to building an understanding of the user's current cognitive state. It then diagnoses various problems in the situation and adapts the interface appropriately. The interactive component of the system improves processing at each stage. Examples of perceptual, behavioral, and cognitive tools are described throughout the paper. Adaptive and intelligent HCI are important for novel applications of computing, including ubiquitous and human-centered computing
C.: Discriminative models for static human-object interactions
- In: Workshop on Structured Models in Computer Vision
, 2010
"... We advocate an approach to activity recognition based on modeling contextual interactions between postured human bodies and nearby objects. We focus on the difficult task of recognizing actions from static images and formulate the problem as a latent structured labeling problem. We develop a unified ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We advocate an approach to activity recognition based on modeling contextual interactions between postured human bodies and nearby objects. We focus on the difficult task of recognizing actions from static images and formulate the problem as a latent structured labeling problem. We develop a unified, discriminative model for such contextbased action recognition building on recent techniques for learning large-scale discriminative models. The resulting contextual models learned by our system outperform previously published results on a database of sports actions. 1.

