Results 1 - 10
of
93
A system for learning statistical motion patterns
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... permission from the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of th ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
permission from the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. © 2006 IEEE. Copyright and all rights therein are retained by authors or by other copyright holders. All persons downloading this information are expected to adhere to the terms and constraints invoked by copyright. This document or any part thereof may not be reposted without the explicit permission of the copyright holder. Citation for this copy:
Multimodal human computer interaction: A survey
, 2005
"... In this paper we review the major approaches to Multimodal Human Computer Interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
In this paper we review the major approaches to Multimodal Human Computer Interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.
A sensory grammar for inferring behaviors in sensor networks
- In Proceedings of Information Processing in Sensor Networks (IPSN
, 2006
"... The ability of a sensor network to parse out observable activities into a set of distinguishable actions is a powerful feature that can potentially enable many applications of sensor networks to everyday life situations. In this paper we introduce a framework that uses a hierarchy of Probabilistic C ..."
Abstract
-
Cited by 30 (17 self)
- Add to MetaCart
The ability of a sensor network to parse out observable activities into a set of distinguishable actions is a powerful feature that can potentially enable many applications of sensor networks to everyday life situations. In this paper we introduce a framework that uses a hierarchy of Probabilistic Context Free Grammars (PCFGs) to perform such parsing. The power of the framework comes from the hierarchical organization of grammars that allows the use of simple local sensor measurements for reasoning about more macroscopic behaviors. Our presentation describes how to use a set of phonemes to construct grammars and how to achieve distributed operation using a messaging model. The proposed framework is flexible. It can be mapped to a network hierarchy or can be applied sequentially and across the network to infer behaviors as they unfold in space and time. We demonstrate this functionality by inferring simple motion patterns using a sequence of simple direction vectors obtained from our camera sensor network testbed.
The function space of an activity
- in Proc. Comput. Vis. Pattern Recognit
"... An activity consists of an actor performing a series of actions in a pre-defined temporal order. An action is an individual atomic unit of an activity. Different instances of the same activity may consist of varying relative speeds at which the various actions are executed, in addition to other intr ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
An activity consists of an actor performing a series of actions in a pre-defined temporal order. An action is an individual atomic unit of an activity. Different instances of the same activity may consist of varying relative speeds at which the various actions are executed, in addition to other intra- and inter- person variabilities. Most existing algorithms for activity recognition are not very robust to intra- and inter-personal changes of the same activity, and are extremely sensitive to warping of the temporal axis due to variations in speed profile. In this paper, we provide a systematic approach to learn the nature of such time warps while simultaneously allowing for the variations in descriptors for actions. For each activity we learn an ‘average ’ sequence that we denote as the nominal activity trajectory. We also learn a function space of time warpings for each activity separately. The model can be used to learn individualspecific warping patterns so that it may also be used for activity based person identification. The proposed model leads us to algorithms for learning a model for each activity, clustering activity sequences and activity recognition that are robust to temporal, intra- and inter-person variations. We provide experimental results using two datasets. 1.
Human action recognition using distribution of oriented rectangular patches
- IN: WORKSHOP ON HUMAN MOTION
, 2007
"... We describe a “bag-of-rectangles ” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distr ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
We describe a “bag-of-rectangles ” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In order to carry the information from the spatial domain described by the bag-of-rectangles descriptor to temporal domain for recognition of the actions, four different methods are proposed. These are namely, (i) frame by frame voting, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis by rectangular patches, (iii) a classifier based approach using SVMs, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the descriptor. The detailed experiments are carried out on the action dataset of Blank et. al. High success rates (100%) prove that with a very simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations.
A Lightweight Camera Sensor Network Operating on Symbolic Information
"... Abstract — This paper provides an overview of the research aspects of our DSC06 demonstration. We present a new camera sensor network for behavior recognition. Two new technologies are explored, biologically inspired address-event image sensors and sensory grammars. This paper explains how these two ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract — This paper provides an overview of the research aspects of our DSC06 demonstration. We present a new camera sensor network for behavior recognition. Two new technologies are explored, biologically inspired address-event image sensors and sensory grammars. This paper explains how these two technologies are used together and reports of the current status of our prototyping effort. The application of the resulting system in assisted living is also described. I.
Hidden markov models for optical flow analysis
- in crowds. International Conference on Pattern Recognition
, 2006
"... This paper presents an event detector for emergencies in crowds. Assuming a single camera and a dense crowd we rely on optical flow instead of tracking statistics as a feature to extract information from the crowd video data. The optical flow features are encoded with Hidden Markov Models to allow f ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper presents an event detector for emergencies in crowds. Assuming a single camera and a dense crowd we rely on optical flow instead of tracking statistics as a feature to extract information from the crowd video data. The optical flow features are encoded with Hidden Markov Models to allow for the detection of emergency or abnormal events in the crowd. In order to increase the detection sensitivity a local modelling approach is used. The results with simulated crowds show the effectiveness of the proposed approach on detecting abnormalities in dense crowds. 1
Fisher.Modelling crowd scenes for event detection
- In International Conference on Pattern Recognition, Hong Kong
, 2006
"... This work presents an automatic technique for detection of abnormal events in crowds. Crowd behaviour is difficult to predict and might not be easily semantically translated. Moreover it is difficulty to track individuals in the crowd using state of the art tracking algorithms. Therefore we characte ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This work presents an automatic technique for detection of abnormal events in crowds. Crowd behaviour is difficult to predict and might not be easily semantically translated. Moreover it is difficulty to track individuals in the crowd using state of the art tracking algorithms. Therefore we characterise crowd behaviour by observing the crowd optical flow and use unsupervised feature extraction to encode normal crowd behaviour. The unsupervised feature extraction applies spectral clustering to find the optimal number of models to represent normal motion patterns. The motion models are HMMs to cope with the variable number of motion samples that might be present in each observation window. The results on simulated crowds demonstrate the effectiveness of the approach for detecting crowd emergency scenarios. 1
Principal Axis-Based Correspondence between Multiple Cameras for People Tracking
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... Visual surveillance using multiple cameras has attracted increasing interest in recent years. Correspondence between multiple cameras is one of the most important and basic problems which visual surveillance using multiple cameras brings. In this paper, we propose a simple and robust method, based ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Visual surveillance using multiple cameras has attracted increasing interest in recent years. Correspondence between multiple cameras is one of the most important and basic problems which visual surveillance using multiple cameras brings. In this paper, we propose a simple and robust method, based on principal axes of people, to match people across multiple cameras. The correspondence likelihood reflecting the similarity of principal axis pairs is constructed according to the relationship between “ground-points ” detected in each camera view and the intersections of principal axes detected in different camera views and transformed to the same view. Our method has the following desirable properties: (1) Camera calibration is not needed. (2) Accurate motion detection and segmentation are less critical due to the robustness of the principal axis-based feature to noise. (3) Based on the fused data derived from correspondence results, positions of people in each camera view can be accurately located even when the people are partially occluded in all views. The experimental results on several real video sequences from outdoor environments have demonstrated the effectiveness, efficiency and robustness of our method.
Probabilistic framework for automated analysis of exposure to road collisions
- Transportation Research Record
"... Abstract: The advent of powerful sensing technologies, especially video sensors and computer vision techniques, has allowed for the collection of large quantities of detailed traffic data. They allow us to further advance towards completely automated systems for road safety analysis. This paper pres ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
Abstract: The advent of powerful sensing technologies, especially video sensors and computer vision techniques, has allowed for the collection of large quantities of detailed traffic data. They allow us to further advance towards completely automated systems for road safety analysis. This paper presents a comprehensive probabilistic framework for automated road safety analysis. Building upon traffic conflict techniques and the concept of the safety hierarchy, it provides computational definitions of the probability of collision for road users involved in an interaction. It proposes new definitions for individual road users and aggregated measures over time. This allows the interpretation of traffic from a safety perspective, studying all interactions and their relationship to safety. New and more relevant exposure measures can be derived from this work, and traffic conflicts can be detected. A complete vision-based system is implemented to Road safety is characterized by the absence of accidents, i.e. collisions between road users. The safety is traditionally measured by the number of collisions, or rather its expected number at a given time. Traffic safety diagnosis has been traditionally undertaken using historical collision data. However, there are well-recognized problems of availability and quality

