Results 1 - 10
of
34
Supervised Descent Method and its Applications to Face Alignment
"... Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2 nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smoo ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2 nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, 2 nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, this paper proposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detection. The code is available at www.humansensing.cs. cmu.edu/intraface. 1.
3D Shape Regression for Real-time Facial Animation
"... We present a real-time performance-driven facial animation system based on 3D shape regression. In this system, the 3D positions of facial landmark points are inferred by a regressor from 2D video frames of an ordinary web camera. From these 3D points, the pose and expressions of the face are recove ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We present a real-time performance-driven facial animation system based on 3D shape regression. In this system, the 3D positions of facial landmark points are inferred by a regressor from 2D video frames of an ordinary web camera. From these 3D points, the pose and expressions of the face are recovered by fitting a user-specific blendshape model to them. The main technical contribution of this work is the 3D regression algorithm that learns an accurate, userspecific face alignment model from an easily acquired set of training data, generated from images of the user performing a sequence of predefined facial poses and expressions. Experiments show that our system can accurately recover 3D face shapes even for fast motions, non-frontal faces, and exaggerated expressions. In addition, some capacity to handle partial occlusions and changing lighting conditions is demonstrated.
Sieving Regression Forest Votes for Facial Feature Detection in the Wild
"... In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to fil-te ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to fil-ter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, fil-ter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in ques-tion, makes a sequence of decisions on whether the thresh-old should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold ad-justment especially on ’difficult ’ face images. 1.
Exemplar-based Graph Matching for Robust Facial Landmark Localization
"... Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem is still challenging due to the large variability in pose and appearance, and the existence of occlusions in real-world face images. In this paper, we present exemplar-based graph matching (EGM), a robus ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem is still challenging due to the large variability in pose and appearance, and the existence of occlusions in real-world face images. In this paper, we present exemplar-based graph matching (EGM), a robust framework for facial landmark localization. Compared to conventional algorithms, EGM has three advantages: (1) an affine-invariant shape constraint is learned online from similar exemplars to better adapt to the test face; (2) the optimal landmark configuration can be directly obtained by solving a graph matching problem with the learned shape constraint; (3) the graph matching problem can be optimized efficiently by linear programming. To our best knowledge, this is the first attempt to apply a graph matching technique for facial landmark localization. Experiments on several challenging datasets demonstrate the advantages of EGM over state-of-the-art methods. 1.
Correlation filters for object alignment
- In IEEE Conference on Computer Vision and Pattern Recognition
, 2013
"... Alignment of 3D objects from 2D images is one of the most important and well studied problems in computer vi-sion. A typical object alignment system consists of a land-mark appearance model which is used to obtain an initial shape and a shape model which refines this initial shape by correcting the ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Alignment of 3D objects from 2D images is one of the most important and well studied problems in computer vi-sion. A typical object alignment system consists of a land-mark appearance model which is used to obtain an initial shape and a shape model which refines this initial shape by correcting the initialization errors. Since errors in land-mark initialization from the appearance model propagate through the shape model, it is critical to have a robust landmark appearance model. While there has been much progress in designing sophisticated and robust shape mod-els, there has been relatively less progress in designing ro-bust landmark detection models. In this paper we present an efficient and robust landmark detection model which is de-signed specifically to minimize localization errors thereby leading to state-of-the-art object alignment performance. We demonstrate the efficacy and speed of the proposed ap-proach on the challenging task of multi-view car alignment. 1.
Facial landmark detection by deep multi-task learning
- In ECCV. 94–108
, 2014
"... Abstract. Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the de-tection task as a single and independent problem, we investigate the possibility of improving detection robustness through multi-task learn-ing. Specifically, we wish ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the de-tection task as a single and independent problem, we investigate the possibility of improving detection robustness through multi-task learn-ing. Specifically, we wish to optimize facial landmark detection together with heterogeneous but subtly correlated tasks, e.g.head pose estimation and facial attribute inference. This is non-trivial since different tasks have different learning difficulties and convergence rates. To address this prob-lem, we formulate a novel tasks-constrained deep model, with task-wise early stopping to facilitate learning convergence. Extensive evaluations show that the proposed task-constrained learning (i) outperforms exist-ing methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art method based on cascaded deep model [21]. 1
Alternating Regression Forests for Object Detection and Pose Estimation
"... We present Alternating Regression Forests (ARFs), a novel regression algorithm that learns a Random Forest by optimizing a global loss function over all trees. This inter-relates the information of single trees during the training phase and results in more accurate predictions. ARFs can minimize any ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
We present Alternating Regression Forests (ARFs), a novel regression algorithm that learns a Random Forest by optimizing a global loss function over all trees. This inter-relates the information of single trees during the training phase and results in more accurate predictions. ARFs can minimize any differentiable regression loss without sacri-ficing the appealing properties of Random Forests, like low computational complexity during both, training and testing. Inspired by recent developments for classification [19], we derive a new algorithm capable of dealing with different regression loss functions, discuss its properties and investi-gate the relations to other methods like Boosted Trees. We evaluate ARFs on standard machine learning bench-marks, where we observe better generalization power com-pared to both standard Random Forests and Boosted Trees. Moreover, we apply the proposed regressor to two computer vision applications: object detection and head pose estima-tion from depth images. ARFs outperform the Random For-est baselines in both tasks, illustrating the importance of optimizing a common loss function for all trees. 1.
Accurate fully automatic femur segmentation in pelvic radiographs using regression voting
- in MICCAI
, 2012
"... Abstract. Extraction of bone contours from radiographs plays an im-portant role in disease diagnosis, pre-operative planning, and treatment analysis. We present a fully automatic method to accurately segment the proximal femur in anteroposterior pelvic radiographs. A number of can-didate positions a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Extraction of bone contours from radiographs plays an im-portant role in disease diagnosis, pre-operative planning, and treatment analysis. We present a fully automatic method to accurately segment the proximal femur in anteroposterior pelvic radiographs. A number of can-didate positions are produced by a global search with a detector. Each is then refined using a statistical shape model together with local detectors for each model point. Both global and local models use Random Forest regression to vote for the optimal positions, leading to robust and accu-rate results. The performance of the system is evaluated using a set of 519 images. We show that the fully automated system is able to achieve a mean point-to-curve error of less than 1mm for 98 % of all 519 images. To the best of our knowledge, this is the most accurate automatic method for segmenting the proximal femur in radiographs yet reported.
Learn to Combine Multiple Hypotheses for Accurate Face Alignment
"... In this paper, we present the details of our method in at-tending the 300 Faces in-the-wild (300W) challenge. We build our method on cascade regression framework, where a series of regressors are utilized to progressively refine the shape initialized by face detector. In cascade regression, we use t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we present the details of our method in at-tending the 300 Faces in-the-wild (300W) challenge. We build our method on cascade regression framework, where a series of regressors are utilized to progressively refine the shape initialized by face detector. In cascade regression, we use the HOG feature in a multi-scale manner, where the large pose validation is handled in early stages by HOG feature at large scale, and then shape is refined at later stages with HOG feature at small scale. We observe that the performance of the cascade regression method decreases when the initialization provided by face detector is not ac-curate enough (for faces with large appearance variations, face detection is still a challenging problem). To handle the problem, we propose to generate multiple hypotheses, and then learn to rank or combine these hypotheses to get the final result. The parameters in both learn to rank and learn to combine can be learned in a structural SVM framework. Despite the simplicity of our method, it achieves state-of-the-art performance on LFPW, and dramatically outper-forms the baseline AAM on the 300-W challenge. 1.
Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images
"... Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel dataset termed FashionPose that contains over 7, 000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second. Index Terms—Human pose estimation, fashion, random forest, regression, classification F 1