Results 1 - 10
of
18
Infinite latent conditional random fields for modeling environments through humans
- in RSS
, 2013
"... Abstract—Humans cast a substantial influence on their en-vironments by interacting with it. Therefore, even though an environment may physically contain only objects, it cannot be modeled well without considering humans. In this paper, we model environments not only through objects, but also through ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Humans cast a substantial influence on their en-vironments by interacting with it. Therefore, even though an environment may physically contain only objects, it cannot be modeled well without considering humans. In this paper, we model environments not only through objects, but also through latent human poses and human-object interactions. However, the number of potential human poses is large and unknown, and the human-object interactions vary not only in type but also in which human pose relates to each object. In order to handle such properties, we present Infinite Latent Conditional Random Fields (ILCRFs) that model a scene as a mixture of CRFs generated from Dirichlet processes. Each CRF represents one possible explanation of the scene. In addition to visible object nodes and edges, it generatively models the distribution of different CRF structures over the latent human nodes and corresponding edges. We apply the model to the chal-lenging application of robotic scene arrangement. In extensive experiments, we show that our model significantly outperforms the state-of-the-art results. We further use our algorithm on a robot for placing objects in a new scene. I.
Modeling High-Dimensional Humans for Activity Anticipation using Gaussian Process Latent CRFs
"... Abstract—For robots, the ability to model human configura-tions and temporal dynamics is crucial for the task of anticipating future human activities, yet requires conflicting properties: On one hand, we need a detailed high-dimensional description of human configurations to reason about the physica ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract—For robots, the ability to model human configura-tions and temporal dynamics is crucial for the task of anticipating future human activities, yet requires conflicting properties: On one hand, we need a detailed high-dimensional description of human configurations to reason about the physical plausibility of the prediction; on the other hand, we need a compact representation to be able to parsimoniously model the relations between the human and the environment. We therefore propose a new model, GP-LCRF, which admits both the high-dimensional and low-dimensional representation of humans. It assumes that the high-dimensional representation is generated from a latent variable corresponding to its low-dimensional representation using a Gaussian process. The gener-ative process not only defines the mapping function between the high- and low-dimensional spaces, but also models a distribution of humans embedded as a potential function in GP-LCRF along with other potentials to jointly model the rich context among humans, objects and the activity. Through extensive experiments on activity anticipation, we show that our GP-LCRF consistently outperforms the state-of-the-art results and reduces the predicted human trajectory error by 11.6%. I.
From stochastic grammar to bayes network: Probabilistic parsing of complex activity
- in: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 2014
"... We propose a probabilistic method for parsing a tempo-ral sequence such as a complex activity defined as compo-sition of sub-activities/actions. The temporal structure of the high-level activity is represented by a string-length lim-ited stochastic context-free grammar. Given the grammar, a Bayes ne ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We propose a probabilistic method for parsing a tempo-ral sequence such as a complex activity defined as compo-sition of sub-activities/actions. The temporal structure of the high-level activity is represented by a string-length lim-ited stochastic context-free grammar. Given the grammar, a Bayes network, which we term Sequential Interval Net-work (SIN), is generated where the variable nodes corre-spond to the start and end times of component actions. The network integrates information about the duration of each primitive action, visual detection results for each primitive action, and the activity’s temporal structure. At any moment in time during the activity, message passing is used to per-form exact inference yielding the posterior probabilities of the start and end times for each different activity/action. We provide demonstrations of this framework being applied to vision tasks such as action prediction, classification of the high-level activities or temporal segmentation of a test se-quence; the method is also applicable in Human Robot In-teraction domain where continual prediction of human ac-tion is needed. 1.
Exemplar-based Recognition of Human-Object Interactions
"... This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
Physically Grounded Spatio-Temporal Object Affordances
"... Abstract. Objects in human environments support various functional-ities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affor-dances. Such an under ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Objects in human environments support various functional-ities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affor-dances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given envi-ronment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertain-ties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.
Learning Therapy Strategies from Demonstration Using Latent Dirichlet Allocation
"... The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
The use of robots in stroke rehabilitation has become a pop-ular trend in rehabilitation robotics. However, despite the ac-knowledged value of customized service for individual pa-tients, research on programming adaptive therapy for indi-vidual patients has received little attention. The goal of the current study is to model teletherapy sessions in the form of a generative process for autonomous therapy that approxi-mate the demonstrations of the therapist. The resulting au-tonomous programs for therapy may imitate the strategy that the therapist might have employed and reinforce therapeutic exercises between teletherapy sessions. We propose to en-code the therapist’s decision criteria in terms of the patient’s motor performance features. Specifically, in this work, we apply Latent Dirichlet Allocation on the batch data collected during teletherapy sessions between a single stroke patient and a single therapist. Using the resulting models, the thera-peutic exercise targets are generated and are verified with the same therapist who generated the data.
Learning to Recognize Human Activities from Soft Labeled Data
"... Abstract—An activity recognition system is a very important component for assistant robots, but training such a system usually requires a large and correctly labeled dataset. Most of the previous works only allow training data to have a single activity label per segment, which is overly restrictive ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—An activity recognition system is a very important component for assistant robots, but training such a system usually requires a large and correctly labeled dataset. Most of the previous works only allow training data to have a single activity label per segment, which is overly restrictive because the labels are not always certain. It is, therefore, desirable to allow multiple labels for ambiguous segments. In this paper, we introduce the method of soft labeling, which allows annotators to assign multiple, weighted, labels to data segments. This is useful in many situations, e.g. when the labels are uncertain, when part of the labels are missing, or when multiple annotators assign inconsistent labels. We treat the activity recognition task as a sequential labeling problem. Latent variables are embedded to exploit sub-level semantics for better estimation. We propose a novel method for learning model parameters from soft-labeled data in a max-margin framework. The model is evaluated on a challenging dataset (CAD-120), which is captured by a RGB-D sensor mounted on the robot. To simulate the uncertainty in data annotation, we randomly change the labels for transition segments. The results show significant improvement over the state-of-the-art approach. I.
TO MODEL 3D ENVIRONMENTS
, 2015
"... The ability to correctly reason about human environment is critical for personal robots. For example, if a robot is asked to tidy a room, it needs to detect object types, such as shoes and books, and then decides where to place them properly. Sometimes being able to anticipate human-environment inte ..."
Abstract
- Add to MetaCart
(Show Context)
The ability to correctly reason about human environment is critical for personal robots. For example, if a robot is asked to tidy a room, it needs to detect object types, such as shoes and books, and then decides where to place them properly. Sometimes being able to anticipate human-environment interactions is also de-sirable. For example, the robot would not put any object on the chair if it under-stands that humans would sit on it. The idea of modeling object-object relations has been widely leveraged in many scene understanding applications. For instance, the object found in front of a monitor is more likely to be a keyboard because of the high correlation of the two objects. However, as the objects are designed by humans and for human usage, when we reason about a human environment, we reason about it through an interplay between the environment, objects and humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. The key idea of this thesis is to model environments not only through objects, but also through
"Important Stuff, Everywhere!" Activity Recognition with Salient Proto-Objects as Context
"... Abstract Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data and complicate the transfer of the action recognition framework to novel domains with different objects and object-action relationships. Motivated by recent advances in saliency detection, we propose to use proto-objects to detect object candidate regions in videos without any need of prior knowledge. Our experimental evaluation on three publicly available data sets shows that the integration of proto-objects and simple motion features substantially improves recognition performance, outperforming the stateof-the-art.
Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models
"... Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they per-form a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. A ..."
Abstract
- Add to MetaCart
(Show Context)
Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they per-form a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADAS more time to avoid or prepare for the danger. In this work we anticipate driving maneuvers a few sec-onds before they occur. For this purpose we equip a car with cameras and a computing device to capture the driving context from both inside and outside of the car. We propose an Autoregressive Input-Output HMM to model the contex-tual information alongwith the maneuvers. We evaluate our approach on a diverse data set with 1180 miles of natural freeway and city driving and show that we can anticipate maneuvers 3.5 seconds before they occur with over 80% F1-score in real-time. 1.