Results 1 - 10
of
12
Robust face landmark estimation under occlusion
"... Human faces captured in real-world conditions present large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food). Current face landmark estimation approaches struggle under such conditions ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
(Show Context)
Human faces captured in real-world conditions present large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food). Current face landmark estimation approaches struggle under such conditions since they fail to provide a principled way of handling outliers. We propose a novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features. We show that RCPR improves on previous landmark estimation methods on three popular face datasets (LFPW, LFW and HELEN). We further explore RCPR’s performance by introducing a novel face dataset focused on occlusion, composed of 1,007 faces presenting a wide range of occlusion patterns. RCPR reduces failure cases by half on all four datasets, at the same time as it detects face occlusions with a 80/40 % precision/recall. 1.
Finding Group Interactions in Social Clutter
"... We consider the problem of finding distinctive social in-teractions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short ex-emplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We consider the problem of finding distinctive social in-teractions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short ex-emplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are lo-calized in space; the extent of their interaction is localized in time; and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our ap-proach represents group behaviors by dichotomous collec-tions of descriptors for (a) individual actions, and (b) pair-wise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evalu-ate the approach using three different video collections, two that involve humans and one that involves mice. 1.
Understanding Classifier Errors by Examining Influential Neighbors
"... Modern supervised learning algorithms can learn very accurate and complex discriminating functions. But when these classifiers fail, this complexity can also be a drawback because there is no easy, intuitive way to diagnose why they are failing and remedy the problem. This important ques-tion has re ..."
Abstract
- Add to MetaCart
(Show Context)
Modern supervised learning algorithms can learn very accurate and complex discriminating functions. But when these classifiers fail, this complexity can also be a drawback because there is no easy, intuitive way to diagnose why they are failing and remedy the problem. This important ques-tion has received little attention. To address this problem, we propose a novel method to analyze and understand a classifier’s errors. Our method centers around a measure of how much influence a training example has on the clas-sifier’s prediction for a test example. To understand why a classifier is mispredicting the label of a given test example, the user can find and review the most influential training examples that caused this misprediction, allowing them to focus their attention on relevant areas of the data space. This will aid the user in determining if and how the train-ing data is inconsistently labeled or lacking in diversity, or if the feature representation is insufficient. As computing the influence of each training example is computationally impractical, we propose a novel distance metric to approx-imate influence for boosting classifiers that is fast enough to be used interactively. We also show several novel use paradigms of our distance metric. Through experiments, we show that it can be used to find incorrectly or inconsistently labeled training examples, to find specific areas of the data space that need more training data, and to gain insight into which features are missing from the current representation. 1.
Real-time Crowd Tracking using Parameter Optimized Mixture of Motion Models
, 2014
"... We present a novel, real-time algorithm to track the trajectory of each pedestrian in moderately dense crowded scenes. Our formulation is based on an adaptive particle-filtering scheme that uses a comb-nation of various multi-agent heterogeneous pedestrian simulation models. We automatically comput ..."
Abstract
- Add to MetaCart
We present a novel, real-time algorithm to track the trajectory of each pedestrian in moderately dense crowded scenes. Our formulation is based on an adaptive particle-filtering scheme that uses a comb-nation of various multi-agent heterogeneous pedestrian simulation models. We automatically compute the optimal parameters for each of these different models based on prior tracked data and use the best model as motion prior for our particle-filter based tracking algorithm. We also use our “mixture of motion models ” for adaptive particle selection and accelerate the performance of the online tracking algorithm. The motion model parameter estimation is formulated as an optimization problem, and we use an approach that solves this combinatorial optimization problem in a model independent manner and hence scalable to any multi-agent pedestrian motion model. We evaluate the performance of our approach on different crowd video datasets and highlight the improvement in accuracy over homogeneous motion models and a baseline mean-shift based tracker. In practice, our formulation can compute trajectories of tens of pedestrians on a multi-core desktop CPU in in real time and offer higher accuracy as compared to prior real time pedestrian tracking algorithms.
BURGOS-ARTIZZU et al.: MERGING POSE ESTIMATES ACROSS SPACE AND TIME 1 Merging Pose Estimates Across Space and Time
"... Numerous ‘non-maximum suppression ’ (NMS) post-processing schemes have been proposed for merging multiple independent object detections. We propose a generalization of NMS beyond bounding boxes to merge multiple pose estimates in a single frame. The final estimates are centroids rather than medoids ..."
Abstract
- Add to MetaCart
(Show Context)
Numerous ‘non-maximum suppression ’ (NMS) post-processing schemes have been proposed for merging multiple independent object detections. We propose a generalization of NMS beyond bounding boxes to merge multiple pose estimates in a single frame. The final estimates are centroids rather than medoids as in standard NMS, thus being more accurate than any of the individual candidates. Using the same mathematical framework, we extend our approach to the multi-frame setting, merging multiple independent pose estimates across space and time and outputting both the number and pose of the objects present in a scene. Our approach sidesteps many of the inherent challenges associated with full tracking (e.g. objects entering/leaving a scene, extended periods of occlusion, etc.). We show its versatility by applying it to two distinct state-of-the-art pose estimation algorithms in three domains: human bodies, faces and mice. Our approach improves both detection accuracy (by helping disambiguate correspondences) as well as pose estimation quality and is computationally efficient. 1
HHMI Janelia Farm
"... Automatically classifying behavior of humans and animals from video is one of the most interesting and challenging fields of computer vision, [3, 1, 6]. Most of the successful human behavior recognition works use as features for classification information extracted from a direct representation of th ..."
Abstract
- Add to MetaCart
(Show Context)
Automatically classifying behavior of humans and animals from video is one of the most interesting and challenging fields of computer vision, [3, 1, 6]. Most of the successful human behavior recognition works use as features for classification information extracted from a direct representation of the scene (visual
Mouse Behavior Recognition with The Wisdom of Crowd
, 2013
"... In this thesis, we designed and implemented a crowdsourcing system to annotate mouse behaviors in videos; this involves the development of a novel clip-based video labeling tools, that is more efficient than traditional labeling tools in crowdsourcing platform, as well as the design of probabilistic ..."
Abstract
- Add to MetaCart
(Show Context)
In this thesis, we designed and implemented a crowdsourcing system to annotate mouse behaviors in videos; this involves the development of a novel clip-based video labeling tools, that is more efficient than traditional labeling tools in crowdsourcing platform, as well as the design of probabilistic inference algorithms that predict the true labels and the workers ’ expertise from multiple workers ’ responses. Our algorithms are shown to perform better than majority vote heuristic. We also carried out extensive experiments to determine the effectiveness of our labeling tool, inference algorithms and the overall system.
Matching Mixtures of Trajectories for Human Action Recognition
"... A learning-based framework for action representation and recognition relying on the description of an action by time series of optical flow motion features is presented. In the learning step, the motion curves representing each action are clustered using Gaussian mixture modeling (GMM). In the recog ..."
Abstract
- Add to MetaCart
(Show Context)
A learning-based framework for action representation and recognition relying on the description of an action by time series of optical flow motion features is presented. In the learning step, the motion curves representing each action are clustered using Gaussian mixture modeling (GMM). In the recognition step, the optical flow curves of a probe sequence are also clustered using a GMM, then each probe sequence is projected onto the training space and the probe curves are matched to the learned curves using a non-metric sim-ilarity function based on the longest common subsequence, which is robust to noise and provides an intuitive notion of similarity between trajectories. Also, canonical time warping is utilized to find an alignment between the mean trajectories. Finally, the probe sequence is categorized to the learned action with the maximum similarity using a nearest neighbor classification scheme. We also present a variant of the method where the lengths of the time series are reduced by dimensionality reduction in both training and test phases, in order to smooth out the outliers, which are common in these type of sequences. Experimental results on Weizmann, KTH, UCF Sports and UCF YouTube action databases demonstrate the effectiveness of the proposed method. Keywords:
A DEPTH-MAP APPROACH FOR AUTOMATIC MICE BEHAVIOR RECOGNITION
"... Animal behavior assessment plays an important role in ba-sic and clinical neuroscience. Although assessing the higher functional level of the nervous system is already possible, be-havioral tests are extremely complex to design and analyze. Animal’s responses are often evaluated manually, making it ..."
Abstract
- Add to MetaCart
(Show Context)
Animal behavior assessment plays an important role in ba-sic and clinical neuroscience. Although assessing the higher functional level of the nervous system is already possible, be-havioral tests are extremely complex to design and analyze. Animal’s responses are often evaluated manually, making it subjective, extremely time consuming, poorly reproducible and potentially fallible. The main goal of the present work is to evaluate the use of consumer depth cameras, such as the Microsoft’s Kinect, for detection of behavioral patterns of mice. The hypothesis is that the depth information, should en-able a more feasible and robust method for automatic behav-ior recognition. Thus, we introduce our depth-map based ap-proach comprising mouse segmentation, body-like per-frame feature extraction and per-frame classification given temporal context, to prove the usability of this methodology.
Detecting Social Actions of Fruit Flies
"... Abstract. We describe a system that tracks pairs of fruit flies and auto-matically detects and classifies their actions. We compare experimentally the value of a frame-level feature representation with the more elaborate notion of ‘bout features ’ that capture the structure within actions. Sim-ilarl ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We describe a system that tracks pairs of fruit flies and auto-matically detects and classifies their actions. We compare experimentally the value of a frame-level feature representation with the more elaborate notion of ‘bout features ’ that capture the structure within actions. Sim-ilarly, we compare a simple sliding window classifier architecture with a more sophisticated structured output architecture, and find that window based detectors outperform the much slower structured counterparts, and approach human performance. In addition we test our top perform-ing detector on the CRIM13 mouse dataset, finding that it matches the performance of the best published method. Our Fly-vs-Fly dataset con-tains 22 hours of video showing pairs of fruit flies engaging in 10 social interactions in three different contexts; it is fully annotated by experts, and published with articulated pose trajectory features. 1