• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation. (2013)

by H S Koppula, A Saxena
Venue:In ICML,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 18
Next 10 →

Infinite latent conditional random fields for modeling environments through humans

by Yun Jiang, Ashutosh Saxena - in RSS , 2013
"... Abstract—Humans cast a substantial influence on their en-vironments by interacting with it. Therefore, even though an environment may physically contain only objects, it cannot be modeled well without considering humans. In this paper, we model environments not only through objects, but also through ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
Abstract—Humans cast a substantial influence on their en-vironments by interacting with it. Therefore, even though an environment may physically contain only objects, it cannot be modeled well without considering humans. In this paper, we model environments not only through objects, but also through latent human poses and human-object interactions. However, the number of potential human poses is large and unknown, and the human-object interactions vary not only in type but also in which human pose relates to each object. In order to handle such properties, we present Infinite Latent Conditional Random Fields (ILCRFs) that model a scene as a mixture of CRFs generated from Dirichlet processes. Each CRF represents one possible explanation of the scene. In addition to visible object nodes and edges, it generatively models the distribution of different CRF structures over the latent human nodes and corresponding edges. We apply the model to the chal-lenging application of robotic scene arrangement. In extensive experiments, we show that our model significantly outperforms the state-of-the-art results. We further use our algorithm on a robot for placing objects in a new scene. I.
(Show Context)

Citation Context

...4.3 ILCRF 100 5.0 100 4.6 94.0 4.6 90.0 4.1 90.0 4.4 94.8 4.5 observed, affordances can be used to predict 3D geometry [6], improve human robot interactions [24], detect and anticipate human activity =-=[21, 19, 20]-=-. While these works focus on different problems and require the presence of humans, they all demonstrate the advantages of considering object affordances. V. EXPERIMENTS In our application, the scenes...

Modeling High-Dimensional Humans for Activity Anticipation using Gaussian Process Latent CRFs

by Yun Jiang, Ashutosh Saxena
"... Abstract—For robots, the ability to model human configura-tions and temporal dynamics is crucial for the task of anticipating future human activities, yet requires conflicting properties: On one hand, we need a detailed high-dimensional description of human configurations to reason about the physica ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract—For robots, the ability to model human configura-tions and temporal dynamics is crucial for the task of anticipating future human activities, yet requires conflicting properties: On one hand, we need a detailed high-dimensional description of human configurations to reason about the physical plausibility of the prediction; on the other hand, we need a compact representation to be able to parsimoniously model the relations between the human and the environment. We therefore propose a new model, GP-LCRF, which admits both the high-dimensional and low-dimensional representation of humans. It assumes that the high-dimensional representation is generated from a latent variable corresponding to its low-dimensional representation using a Gaussian process. The gener-ative process not only defines the mapping function between the high- and low-dimensional spaces, but also models a distribution of humans embedded as a potential function in GP-LCRF along with other potentials to jointly model the rich context among humans, objects and the activity. Through extensive experiments on activity anticipation, we show that our GP-LCRF consistently outperforms the state-of-the-art results and reduces the predicted human trajectory error by 11.6%. I.
(Show Context)

Citation Context

...6, 8, 13, 12] or simplify a human configuration to a 2D point for navigation task [3, 18, 44, 23] or to a 3D trajectory of one hand while keeping the rest body static neglecting kinematic constraints =-=[20, 19]-=-. In these works, human motions are underrepresented and would fail when a more elaborate human motion prediction is required. In this work, we design a model that can handle the two competing require...

From stochastic grammar to bayes network: Probabilistic parsing of complex activity

by Nam N. Vo, Aaron F. Bobick - in: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 2014
"... We propose a probabilistic method for parsing a tempo-ral sequence such as a complex activity defined as compo-sition of sub-activities/actions. The temporal structure of the high-level activity is represented by a string-length lim-ited stochastic context-free grammar. Given the grammar, a Bayes ne ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We propose a probabilistic method for parsing a tempo-ral sequence such as a complex activity defined as compo-sition of sub-activities/actions. The temporal structure of the high-level activity is represented by a string-length lim-ited stochastic context-free grammar. Given the grammar, a Bayes network, which we term Sequential Interval Net-work (SIN), is generated where the variable nodes corre-spond to the start and end times of component actions. The network integrates information about the duration of each primitive action, visual detection results for each primitive action, and the activity’s temporal structure. At any moment in time during the activity, message passing is used to per-form exact inference yielding the posterior probabilities of the start and end times for each different activity/action. We provide demonstrations of this framework being applied to vision tasks such as action prediction, classification of the high-level activities or temporal segmentation of a test se-quence; the method is also applicable in Human Robot In-teraction domain where continual prediction of human ac-tion is needed. 1.
(Show Context)

Citation Context

...ly infeasible to derive the distribution of the start and end of actions at arbitrary points in the past or future (prediction) using all available observation up till the current time. Koppula et al =-=[13]-=- introduce Anticipatory Temporal Conditional Random Field, which is an undirected graphical model designed to run online like a DBN. Prediction is done by extending the network into the future. Most r...

Exemplar-based Recognition of Human-Object Interactions

by Jian-fang Hu, Wei-shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang
"... This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
(Show Context)

Citation Context

... capture the structured information of HOI. In [32], complex interactions are modelled by using velocity histories of tracked keypoints. Recently, some works are proposed to model HOI in RGB-D videos =-=[33]-=-, [34], [35]. For instance, [29] presents a method for categorising manipulated objects and tracking 3D articulated hand pose in the context of each other in order to recognise the interactions betwee...

Physically Grounded Spatio-Temporal Object Affordances

by Ashutosh Saxena
"... Abstract. Objects in human environments support various functional-ities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affor-dances. Such an under ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Abstract. Objects in human environments support various functional-ities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affor-dances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given envi-ronment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertain-ties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.
(Show Context)

Citation Context

...luated our approach on the CAD-120 dataset [23], which has 4 subjects performing 120 high-level activities and each high-level activity is a sequence of sub-activities. We take the labeling output of =-=[24]-=- and modify it by including the temporal boundaries computed as above. This gives us a new segmentation hypothesis, which we label using the full energy function described in [24]. 14 H. S. Koppula an...

Learning Therapy Strategies from Demonstration Using Latent Dirichlet Allocation

by Hee-tae Jung, Richard G. Freedman, Tammie Foster, Yu-kyong Choe, Shlomo Zilberstein, Roderic A. Grupen
"... The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the current study is to model teletherapy sessions in the form of a generative process for autonomous therapy that approxi-mate the demonstrations of the therapist. The resulting au-tonomous programs for therapy may imitate the strategy that the therapist might have employed and reinforce therapeutic exercises between teletherapy sessions. We propose to en-code the therapist’s decision criteria in terms of the patient’s motor performance features. Specifically, in this work, we apply Latent Dirichlet Allocation on the batch data collected during teletherapy sessions between a single stroke patient and a single therapist. Using the resulting models, the thera-peutic exercise targets are generated and are verified with the same therapist who generated the data.
(Show Context)

Citation Context

...also been widely used outside of text analysis by applying the bag-of-words assumption to other collections of objects including pixel regions [24] for semantic image analysis, streams of sensor data =-=[8, 12, 18, 27]-=- for activity recognition, and sequences of images [3, 25] for activity recognition and segmentation. PROPOSED APPROACH Learning Therapy Strategies The fundamental idea behind our approach is that the...

Learning to Recognize Human Activities from Soft Labeled Data

by Ninghang Hu, Zhongyu Lou, Gwenn Englebienne
"... Abstract—An activity recognition system is a very important component for assistant robots, but training such a system usually requires a large and correctly labeled dataset. Most of the previous works only allow training data to have a single activity label per segment, which is overly restrictive ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract—An activity recognition system is a very important component for assistant robots, but training such a system usually requires a large and correctly labeled dataset. Most of the previous works only allow training data to have a single activity label per segment, which is overly restrictive because the labels are not always certain. It is, therefore, desirable to allow multiple labels for ambiguous segments. In this paper, we introduce the method of soft labeling, which allows annotators to assign multiple, weighted, labels to data segments. This is useful in many situations, e.g. when the labels are uncertain, when part of the labels are missing, or when multiple annotators assign inconsistent labels. We treat the activity recognition task as a sequential labeling problem. Latent variables are embedded to exploit sub-level semantics for better estimation. We propose a novel method for learning model parameters from soft-labeled data in a max-margin framework. The model is evaluated on a challenging dataset (CAD-120), which is captured by a RGB-D sensor mounted on the robot. To simulate the uncertainty in data annotation, we randomly change the labels for transition segments. The results show significant improvement over the state-of-the-art approach. I.
(Show Context)

Citation Context

...tiple segmentation hypotheses are considered when predicting activities, and a two-step learning algorithm is applied to combine predictions from different segmentation hypotheses. Koppula and Saxena =-=[8]-=- also combine multiple segmentation hypotheses by majority voting, which makes it possible to express the uncertainty over labels at test time but still ties the model to use the provided labels for t...

TO MODEL 3D ENVIRONMENTS

by unknown authors , 2015
"... The ability to correctly reason about human environment is critical for personal robots. For example, if a robot is asked to tidy a room, it needs to detect object types, such as shoes and books, and then decides where to place them properly. Sometimes being able to anticipate human-environment inte ..."
Abstract - Add to MetaCart
The ability to correctly reason about human environment is critical for personal robots. For example, if a robot is asked to tidy a room, it needs to detect object types, such as shoes and books, and then decides where to place them properly. Sometimes being able to anticipate human-environment interactions is also de-sirable. For example, the robot would not put any object on the chair if it under-stands that humans would sit on it. The idea of modeling object-object relations has been widely leveraged in many scene understanding applications. For instance, the object found in front of a monitor is more likely to be a keyboard because of the high correlation of the two objects. However, as the objects are designed by humans and for human usage, when we reason about a human environment, we reason about it through an interplay between the environment, objects and humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. The key idea of this thesis is to model environments not only through objects, but also through
(Show Context)

Citation Context

...1 Pre@3 micro-P/R@1 macro-F1@1 Pre@3 MHD@1 (cm) Chance 10.0±0.1 10.0±0.1 30.0±0.1 8.3±0.1 8.3±0.1 24.9±0.1 48.1±0.9 ATCRF-KGS [75] 47.7±1.6 37.9±2.6 69.2±2.1 66.1±1.9 36.7±2.3 71.3±1.7 31.0±1.0 ATCRF =-=[76]-=- 49.6±1.4 40.6±1.6 74.4±1.6 67.2±1.1 41.4±1.5 73.2±1.0 30.2±1.0 HighDim-LCRF 47.0±1.8 37.2±2.8 68.5±2.1 65.8±1.8 37.3±2.4 70.6±1.6 29.3±0.9 PPCA-LCRF 50.0±1.5 40.7±1.4 74.2±1.2 67.8±1.7 41.7±1.3 73.4±...

"Important Stuff, Everywhere!" Activity Recognition with Salient Proto-Objects as Context

by Lukas Rybok , Boris Schauerte , Ziad Al-Halah , Rainer Stiefelhagen
"... Abstract Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data ..."
Abstract - Add to MetaCart
Abstract Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data and complicate the transfer of the action recognition framework to novel domains with different objects and object-action relationships. Motivated by recent advances in saliency detection, we propose to use proto-objects to detect object candidate regions in videos without any need of prior knowledge. Our experimental evaluation on three publicly available data sets shows that the integration of proto-objects and simple motion features substantially improves recognition performance, outperforming the stateof-the-art.
(Show Context)

Citation Context

...utperforms all other approaches by at least 4.3% (relative improvement), including Koppula et al.’s recently proposed state-of-theart method [16]. The only exception is the work of Koppula and Saxena =-=[17]-=-, which however relies on ground-truth object tracks and is thus not comparable to our approach. The confusion matrix in Fig. 3 reveals that most of the problems of our approach lie in confusing activ...

Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models

by Ashesh Jain, Bharad Raghavan, Shane Soh, Ashutosh Saxena, Brain Of, Things Inc
"... Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they per-form a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. A ..."
Abstract - Add to MetaCart
Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they per-form a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADAS more time to avoid or prepare for the danger. In this work we anticipate driving maneuvers a few sec-onds before they occur. For this purpose we equip a car with cameras and a computing device to capture the driving context from both inside and outside of the car. We propose an Autoregressive Input-Output HMM to model the contex-tual information alongwith the maneuvers. We evaluate our approach on a diverse data set with 1180 miles of natural freeway and city driving and show that we can anticipate maneuvers 3.5 seconds before they occur with over 80% F1-score in real-time. 1.
(Show Context)

Citation Context

...-the-shelf available face detection and tracking algorithms for robustness required for anticipation (Section 5). Learning temporal models. Temporal models are commonly used to model human activities =-=[14, 23, 40, 41]-=-. These models have been used in both discriminative and generative fashions. The discriminative temporal models are mostly inspired by the Conditional Random Field (CRF) [18] which captures the tempo...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University