• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests

by Danhang Tang, Tsz-ho Yu, Tae-kyun Kim
Venue:In: Proc. ICCV (2013
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 26
Next 10 →

Realtime and Robust Hand Tracking from Depth

by Chen Qian
"... We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large view-points in realtime (25 FPS on a desktop without using a GPU) and with high accuracy (error below 10 mm). To our knowledge, it is the first system that achieves such ro-bustness, accur ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large view-points in realtime (25 FPS on a desktop without using a GPU) and with high accuracy (error below 10 mm). To our knowledge, it is the first system that achieves such ro-bustness, accuracy, and speed simultaneously, as verified on challenging real data. Our system is made of several novel techniques. We mod-el a hand simply using a number of spheres and define a fast cost function. Those are critical for realtime performance. We propose a hybrid method that combines gradient based and stochastic optimization methods to achieve fast conver-gence and good accuracy. We present new finger detection and hand initialization methods that greatly enhance the ro-bustness of tracking. 1.
(Show Context)

Citation Context

...h and local optimization, but rely on inconvenient setup (a color glove in [21] and multiple cameras in [25]). Other realtime and robust systems are limited in recognizing discrete hand gestures only =-=[31, 5, 6, 29]-=- without optimization, supporting a small number of DOFs [22], or under a fixed viewpoint [12]. Those limitations above are due to difficult tradeoffs between the system complexity and targeted goals....

Multi-scale deep learning for gesture detection

by Natalia Neverova
"... and localization ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
and localization

In-air Gestures Around Unmodified Mobile Devices

by Jie Song, Fabrizio Pece, Sean Fanello, Shahram Izadi, Cem Keskin, Otmar Hilliges
"... Figure 1: Touch input is expressive but can occlude large parts of the screen (A). We propose a machine learning based algorithm for gesture recognition expanding the interaction space around the mobile device (B), adding in-air gestures and hand-part tracking (D) to commodity off-the-shelf mobile d ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Figure 1: Touch input is expressive but can occlude large parts of the screen (A). We propose a machine learning based algorithm for gesture recognition expanding the interaction space around the mobile device (B), adding in-air gestures and hand-part tracking (D) to commodity off-the-shelf mobile devices, relying only on the device’s camera (and no hardware modifications). We demonstrate a number of compelling interactive scenarios including bi-manual input to mapping and gaming applications (C+D). The algorithm runs in real time and can even be used on ultra-mobile devices such as smartwatches (E). We present a novel machine learning based algorithm ex-tending the interaction space around mobile devices. The technique uses only the RGB camera now commonplace on off-the-shelf mobile devices. Our algorithm robustly recog-nizes a wide range of in-air gestures, supporting user varia-tion, and varying lighting conditions. We demonstrate that our algorithm runs in real-time on unmodified mobile devices, in-cluding resource-constrained smartphones and smartwatches. Our goal is not to replace the touchscreen as primary input device, but rather to augment and enrich the existing interac-tion vocabulary using gestures. While touch input works well for many scenarios, we demonstrate numerous interaction tasks such as mode switches, application and task manage-ment, menu selection and certain types of navigation, where such input can be either complemented or better served by in-air gestures. This removes screen real-estate issues on small touchscreens, and allows input to be expanded to the 3D space around the device. We present results for recognition accuracy (93 % test and 98 % train), impact of memory footprint and other model parameters. Finally, we report results from pre-liminary user evaluations, discuss advantages and limitations and conclude with directions for future work. Author Keywords HCI; mobile interaction; mobile gestures; gesture recognition; mobile computing; random forests
(Show Context)

Citation Context

...n real-time [19, 30, 35]. The current state-of-the art can be broken down into methods relying on model-fitting and temporal tracking [30, 35], and those leveraging per-pixel hand part classification =-=[19, 38]-=-. Our algorithm is designed for the recognition of rich and varied gestures and detection of salient hand parts (i.e., fingertips) rather than full hand pose estimation. It is important to note though...

Latent-Class Hough Forests for 3D Object Detection and Pose Estimation

by Alykhan Tejani, Danhang Tang, Rigas Kouskouridas, Tae-kyun Kim
"... Abstract. In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and oc-cluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integra ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and oc-cluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. In training, rather than explicitly collecting representative negative samples, our method is trained on positive samples only and we treat the class distributions at the leaf nodes as latent variables. During the inference process we iteratively update these dis-tributions, providing accurate estimation of background clutter and foreground occlusions and thus a better detection rate. Furthermore, as a by-product, the la-tent class distributions can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected a new, more challenging, dataset for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We evalu-ate the Latent-Class Hough Forest on both of these datasets where we outperform state-of-the art methods. 1
(Show Context)

Citation Context

...t the synthetic training images have null space in the background whereas the testing patches will not. Thus, doing a naive holistic patch comparison, or the two-dimenson/ two-pixel tests (as used in =-=[29, 8, 33]-=-) can lead to test patches taking the incorrect route at split functions. iii) LINEMOD [14], in its current form, is not a scale-invariant descriptor; this gives rise to further issues, such as should...

Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model

by Srinath Sridhar, Helge Rhodin, Hans-peter Seidel, Antti Oulasvirta, Christian Theobalt
"... Real-time marker-less hand tracking is of increasing im-portance in human-computer interaction. Robust and ac-curate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this pa-per, we propo ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Real-time marker-less hand tracking is of increasing im-portance in human-computer interaction. Robust and ac-curate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this pa-per, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representa-tion based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differen-tiable making fast gradient based pose optimization possi-ble. This shape representation, together with a full perspec-tive projection model, enables more accurate hand mod-eling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualita-tively and quantitatively on publicly available datasets. 1.

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images

by Matthias Dantone, Juergen Gall, Ieee Christian Leistner, Luc Van Gool
"... Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel dataset termed FashionPose that contains over 7, 000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second. Index Terms—Human pose estimation, fashion, random forest, regression, classification F 1
(Show Context)

Citation Context

...applications including real-time face analysis from depth data [52] and 2d images [53], model fitting [54], multi-object segmentation [55], object detection [56], and articulated hand pose estimation =-=[57]-=-. 3 PICTORIAL STRUCTURE As a human body model, we use a classical pictorial structure framework [4]. However, instead of using a limb representation for the body configuration, we use a joint represen...

Cascaded hand pose regression

by Xiao Sun, Yichen Wei, Shuang Liang, Xiaoou Tang, Jian Sun - In CVPR
"... We extends the previous 2D cascaded object pose regres-sion work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is 3D pose-indexed features that generalize the previous 2D parame-terized features and achieve better invariance to 3D trans-formations. Our ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
We extends the previous 2D cascaded object pose regres-sion work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is 3D pose-indexed features that generalize the previous 2D parame-terized features and achieve better invariance to 3D trans-formations. Our second contribution is a principled hier-archical regression that is adapted to the articulated object structure. It is therefore more accurate and faster. Com-prehensive experiments verify the state-of-the-art accuracy and efficiency of the proposed approach on the challenging 3D hand pose estimation problem, on a public dataset and our new dataset. 1.
(Show Context)

Citation Context

...e better aligned in their local coordinate frame than the global palm coordinate frame. 3.2. 3D Pose Indexed Features Similar to previous depth based learning methods for human body [18, 29] and hand =-=[6, 39, 10, 34, 22]-=-, we use the pixel difference features, i.e., a feature is the difference of two random pixels’ depth value, I(u1)− I(u2). The key to achieve certain geometric invariance is how to parameterize ui(i =...

Depth-based hand pose estimation: data, methods, and challenges

by James Steven, Grégory Rogez, Yi Yang, Jamie Shotton, Deva Ramanan
"... Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new ad-vances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new ad-vances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable num-ber of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby ob-jects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate cri-teria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experi-ments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that train-ing data is as important as the model itself. We conclude with directions for future progress. 1.
(Show Context)

Citation Context

... practical applications, for example sign language recognition [14], visual interfaces [17], and driver analysis [20]. Recently introduced consumer depth cameras have spurred a flurry of new advances =-=[4, 14, 15, 17, 25, 33, 36, 38, 42]-=-. Motivation: Recent methods have demonstrated impressive results. But differing (often in-house) testsets, varying performance criteria, and annotation errors impede reliable comparisons [19]. In the...

Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose

by Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-kyun Kim, Jamie Shotton
"... We address the problem of hand pose estimation, formu-lated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box ’ image generation procedure. This procedure knows lit-tle about either the relationships between the parameters or the form of th ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We address the problem of hand pose estimation, formu-lated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box ’ image generation procedure. This procedure knows lit-tle about either the relationships between the parameters or the form of the energy function. In this paper, we show that we can significantly improve upon black box optimization by exploiting high-level knowledge of the parameter struc-ture and using a local surrogate energy function. Our new framework, hierarchical sampling optimization, consists of a sequence of predictors organized into a kinematic hier-archy. Each predictor is conditioned on its ancestors, and generates a set of samples over a subset of the pose pa-rameters. The highly-efficient surrogate energy is used to select among samples. Having evaluated the full hierar-chy, the partial pose samples are concatenated to generate a full-pose hypothesis. Several hypotheses are generated us-ing the same procedure, and finally the original full energy function selects the best result. Experimental evaluation on three publically available datasets show that our method is particularly impressive in low-compute scenarios where it significantly outperforms all other state-of-the-art methods. 1.

Structured Semi-supervised Forest for Facial Landmarks Localization with Face Mask Reasoning

by Xuhui Jia, Heng Yang, Angran Lin, Kwok-ping Chan, Ioannis Patras
"... Despite the great success of recent facial landmarks localization approaches, the pres-ence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we addr ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Despite the great success of recent facial landmarks localization approaches, the pres-ence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we address the face mask reasoning and facial landmarks lo-calization in an unified Structured Decision Forests framework. We first assign a portion of the face dataset with face masks, i.e., for each face image we give each pixel a label to indicate whether it belongs to the face or not. Then we incorporate such additional infor-mation of dense pixel labelling into training the Structured Classification-Regression De-cision Forest. The classification nodes aim at decreasing the variance of the pixel labels of the patches by using our proposed structured criterion while the regression nodes aim at decreasing the variance of the displacements between the patches and the facial land-marks. The proposed framework allows us to predict the face mask and facial landmarks locations jointly. We test the model on face images from several datasets with significant occlusion. The proposed method 1) yields promising results in face mask reasoning; 2) improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning. 1
(Show Context)

Citation Context

...tor andM is the face mask). Thus, we have two objectives: first, localization of the landmarks and second, the structured labels of different classes (face or non-face). Similar to the hybrid forests =-=[32]-=-, we use two separate types of split nodes that optimize different objective functions. The first type of node is for regression and the second type is for classification. Specifically, for a given no...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University