Results 1 - 10
of
28
Anticipating human activities using object affordances for reactive robotic response
"... Abstract—An important aspect of human perception is anticipation, which we use extensively in our day-to-day activities when interacting with other humans as well as with our surroundings. Anticipating which activities will a human do next (and how) can enable an assistive robot to plan ahead for re ..."
Abstract
-
Cited by 44 (15 self)
- Add to MetaCart
(Show Context)
Abstract—An important aspect of human perception is anticipation, which we use extensively in our day-to-day activities when interacting with other humans as well as with our surroundings. Anticipating which activities will a human do next (and how) can enable an assistive robot to plan ahead for reactive responses in human environments. Furthermore, anticipation can even improve the detection accuracy of past activities. The challenge, however, is two-fold: We need to capture the rich context for modeling the activities and object affordances, and we need to anticipate the distribution over a large space of future human activities. In this work, we represent each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances. We then consider each ATCRF as a particle and represent the distribution over the potential futures using a set of particles. In extensive evaluation on CAD-120 human activity RGB-D dataset, we first show that anticipation improves the state-ofthe-art detection results. For a new subjects (not seen in the training set), we obtain an activity anticipation accuracy (defined as whether one of top three predictions actually happened) of 75.4%, 69.2 % and 58.1 % for an anticipation time of 1, 3 and 10 seconds respectively. Finally, we also use our algorithm on a robot for performing a few reactive responses. I.
Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera
- In: CVPR (2013
"... Local spatio-temporal interest points (STIPs) and the re-sulting features from RGB videos have been proven success-ful at activity recognition that can handle cluttered back-grounds and partial occlusions. In this paper, we propose its counterpart in depth video and show its efficacy on ac-tivity re ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
Local spatio-temporal interest points (STIPs) and the re-sulting features from RGB videos have been proven success-ful at activity recognition that can handle cluttered back-grounds and partial occlusions. In this paper, we propose its counterpart in depth video and show its efficacy on ac-tivity recognition. We present a filtering method to extract STIPs from depth videos (called DSTIP) that effectively sup-press the noisy measurements. Further, we build a novel depth cuboid similarity feature (DCSF) to describe the lo-cal 3D depth cuboid around the DSTIPs with an adaptable supporting size. We test this feature on activity recognition application using the public MSRAction3D, MSRDailyAc-tivity3D datasets and our own dataset. Experimental evalu-ation shows that the proposed approach outperforms state-of-the-art activity recognition algorithms on depth videos, and the framework is more widely applicable than existing approaches. We also give detailed comparisons with other features and analysis of choice of parameters as a guidance for applications. 1.
Fusing Spatiotemporal Features and Joints for 3D Action Recognition.
- In CVPRW,
, 2013
"... Abstract We present a novel approach to 3D ..."
(Show Context)
Two-person Interaction Detection Using Body-Pose Features and Multiple Instance Learning
"... Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body hum ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest. 1.
A survey on human motion analysis from depth data
- In: Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications
, 2013
"... Abstract. Human pose estimation has been actively studied for decades. While traditional approaches rely on 2d data like images or videos, the development of Time-of-Flight cameras and other depth sensors created new opportunities to advance the field. We give an overview of recent approaches that p ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Human pose estimation has been actively studied for decades. While traditional approaches rely on 2d data like images or videos, the development of Time-of-Flight cameras and other depth sensors created new opportunities to advance the field. We give an overview of recent approaches that perform human motion analysis which includes depth-based and skeleton-based activity recognition, head pose estimation, fa-cial feature detection, facial performance capture, hand pose estimation and hand gesture recognition. While the focus is on approaches using depth data, we also discuss traditional image based methods to provide a broad overview of recent developments in these areas. 1
Privacy preserving automatic fall detection for elderly using rgbd cameras
- In Computers Helping People with Special
, 2012
"... Abstract. In this paper, we propose a new privacy preserving automatic fall de-tection method to facilitate the independence of older adults living in the com-munity, reduce risks, and enhance the quality of life at home activities of daily living (ADLs) by using RGBD cameras. Our method can recogni ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a new privacy preserving automatic fall de-tection method to facilitate the independence of older adults living in the com-munity, reduce risks, and enhance the quality of life at home activities of daily living (ADLs) by using RGBD cameras. Our method can recognize 5 activities including standing, fall from standing, fall from chair, sit on chair, and sit on floor. The main analysis is based on the 3D depth information due to the advantages of handling illumination changes and identity protection. If the monitored person is out of the range of a 3D camera, RGB video is employed to continue the activity monitoring. Furthermore, we design a hierarchy classification schema to robustly recognize 5 activities. Experimental results on our database collected under con-ditions with normal lighting, without lighting, out of depth range demonstrate the effectiveness of the proposal method.
Simplex-based 3D spatio-temporal feature description for action recognition
- In IEEE Conference on Computer Vision and Pattern Recognition. 180
, 2014
"... We present a novel feature description algorithm to de-scribe 3D local spatio-temporal features for human action recognition. Our descriptor avoids the singularity and lim-ited discrimination power issues of traditional 3D descrip-tors by quantizing and describing visual features in the sim-plex top ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
We present a novel feature description algorithm to de-scribe 3D local spatio-temporal features for human action recognition. Our descriptor avoids the singularity and lim-ited discrimination power issues of traditional 3D descrip-tors by quantizing and describing visual features in the sim-plex topological vector space. Specifically, given a feature’s support region containing a set of 3D visual cues, we de-compose the cues ’ orientation into three angles, transform the decomposed angles into the simplex space, and describe them in such a space. Then, quadrant decomposition is per-formed to improve discrimination, and a final feature vec-tor is composed from the resulting histograms. We develop intuitive visualization tools for analyzing feature character-istics in the simplex topological vector space. Experimental results demonstrate that our novel simplex-based orienta-tion decomposition (SOD) descriptor substantially outper-forms traditional 3D descriptors for the KTH, UCF Sport, and Hollywood-2 benchmark action datasets. In addition, the results show that our SOD descriptor is a superior indi-vidual descriptor for action recognition. 1.
3d reconstruction of freely moving persons for reidentification with a depth sensor. Paper presented at
- the IEEE International Conference on Robotics and Automation
, 2014
"... Abstract — In this work, we describe a novel method for creating 3D models of persons freely moving in front of a consumer depth sensor and we show how they can be used for long-term person re-identification. For overcoming the problem of the different poses a person can assume, we exploit the infor ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract — In this work, we describe a novel method for creating 3D models of persons freely moving in front of a consumer depth sensor and we show how they can be used for long-term person re-identification. For overcoming the problem of the different poses a person can assume, we exploit the information provided by skeletal tracking algorithms for warping every point cloud frame to a standard pose in real time. Then, the warped point clouds are merged together to compose the model. Re-identification is performed by matching body shapes in terms of whole point clouds warped to a standard pose with the described method. We compare this technique with a classification method based on a descriptor of skeleton features and with a mixed approach which exploits both skeleton and shape features. We report experiments on two datasets we acquired for RGB-D re-identification which use different skeletal tracking algorithms and which are made publicly available to foster research in this new research branch. I.
Fuzzy segmentation and recognition of continuous human activities
- in IEEE International Conference on Robotics and Automation, In print
"... Abstract — Most previous research has focused on classifying single human activities contained in segmented videos. However, in real-world scenarios, human activities are inherently contin-uous and gradual transitions always exist between temporally adjacent activities. In this paper, we propose a F ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Most previous research has focused on classifying single human activities contained in segmented videos. However, in real-world scenarios, human activities are inherently contin-uous and gradual transitions always exist between temporally adjacent activities. In this paper, we propose a Fuzzy Segmen-tation and Recognition (FuzzySR) algorithm to explicitly model this gradual transition. Our goal is to simultaneously segment a given video into events and recognize the activity contained in each event. Specifically, our algorithm uniformly partitions the video into a sequence of non-overlapping blocks, each of which lasts a short period of time. Then, a multi-variable time series is creatively formed through concatenating the block-level human activity summaries that are computed using topic models over each block’s local spatio-temporal features. By representing an event as a fuzzy set that has fuzzy boundaries to model gradual transitions, our algorithm is able to segment the video into a sequence of fuzzy events. By incorporating all block summaries contained in an event, the proposed algorithm determines the most appropriate activity category for each event. We evaluate our algorithm’s performance using two real-world benchmark datasets that are widely used in the machine vision community. We also demonstrate our algorithm’s effectiveness in important robotics applications, such as intelligent service robotics. For all used datasets, our algorithm achieves promising continuous human activity segmentation and recognition results. I.
Learning Therapy Strategies from Demonstration Using Latent Dirichlet Allocation
"... The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the current study is to model teletherapy sessions in the form of a generative process for autonomous therapy that approxi-mate the demonstrations of the therapist. The resulting au-tonomous programs for therapy may imitate the strategy that the therapist might have employed and reinforce therapeutic exercises between teletherapy sessions. We propose to en-code the therapist’s decision criteria in terms of the patient’s motor performance features. Specifically, in this work, we apply Latent Dirichlet Allocation on the batch data collected during teletherapy sessions between a single stroke patient and a single therapist. Using the resulting models, the thera-peutic exercise targets are generated and are verified with the same therapist who generated the data.