Results 1 - 10
of
82
C.: Structured Forests for Fast Edge Detection
"... Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image pa ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
(Show Context)
Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computation-ally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning frame-work applied to random decision forests. Our novel ap-proach to learning decision trees robustly maps the struc-tured labels to a discrete space on which standard infor-mation gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge de-tection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets. 1.
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
"... We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundar ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gP b − ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art. 1.
Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
"... We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustere ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
(Show Context)
We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both topdown and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200 × faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms. 1.
Person reidentification: What features are important
- in ECCV Workshop on Person Re-identification
, 2012
"... Abstract. State-of-the-art person re-identification methods seek robust person matching through combining various feature types. Often, these features are implicitly assigned with a single vector of global weights, which are assumed to be universally good for all individuals, independent to their di ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
(Show Context)
Abstract. State-of-the-art person re-identification methods seek robust person matching through combining various feature types. Often, these features are implicitly assigned with a single vector of global weights, which are assumed to be universally good for all individuals, independent to their different appearances. In this study, we show that certain features play more important role than others under different circumstances. Consequently, we propose a novel unsupervised approach for learning a bottom-up feature importance, so features extracted from different individuals are weighted adaptively driven by their unique and inherent appearance attributes. Extensive experiments on two public datasets demonstrate that attribute-sensitive feature importance facilitates more accurate person matching when it is fused together with global weights obtained using existing methods. 1
Human Pose Estimation using Body Parts Dependent Joint Regressors
"... In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the probl ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
(Show Context)
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods. 1.
Sieving Regression Forest Votes for Facial Feature Detection in the Wild
"... In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to fil-te ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to fil-ter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, fil-ter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in ques-tion, makes a sequence of decisions on whether the thresh-old should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold ad-justment especially on ’difficult ’ face images. 1.
Improved information gain estimates for decision tree induction
- In ICML 2012
, 2012
"... Ensembles of classification and regression trees remain popular machine learning methods because they define flexible nonparametric models that predict well and are computationally efficient both during training and testing. During induction of decision trees one aims to find predicates that are max ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Ensembles of classification and regression trees remain popular machine learning methods because they define flexible nonparametric models that predict well and are computationally efficient both during training and testing. During induction of decision trees one aims to find predicates that are maximally informative about the prediction target. To select good predicates most approaches estimate an informationtheoretic scoring function, the information gain, both for classification and regression problems. We point out that the common estimation procedures are biased and show that by replacing them with improved estimators of the discrete and the differential entropy we can obtain better decision trees. In effect our modifications yield improved predictive performance and are simple to implement in any decision tree code. 1.
Decision Forests for Tissue-specific Segmentation of High-grade Gliomas in Multi-channel MR
"... Abstract. We present a method for automatic segmentation of highgrade gliomas and their subregions from multi-channel MR images. Besides segmenting the gross tumor, we also differentiate between active cells, necrotic core, and edema. Our discriminative approach is based on decision forests using co ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We present a method for automatic segmentation of highgrade gliomas and their subregions from multi-channel MR images. Besides segmenting the gross tumor, we also differentiate between active cells, necrotic core, and edema. Our discriminative approach is based on decision forests using context-aware spatial features, and integrates a generative model of tissue appearance, by using the probabilities obtained by tissue-specific Gaussian mixture models as additional input for the forest. Our method classifies the individual tissue types simultaneously, which has the potential to simplify the classification task. The approach is computationally efficient and of low model complexity. The validation is performed on a labeled database of 40 multi-channel MR images, including DTI. We assess the effects of using DTI, and varying the amount of training data. Our segmentation results are highly accurate, and compare favorably to the state of the art. 1
Privileged information-based conditional regression forests for facial feature detection
- in IEEE Conf. Automatic Face and Gesture Recognition
, 2013
"... Abstract—In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes acco ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild. I.
Video synopsis by heterogeneous multi-source correlation
- in Proceedings of the 14th IEEE International Conference on Computer Vision
, 2013
"... Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambi-guity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capabl ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
(Show Context)
Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambi-guity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and indepen-dent non-visual auxiliary information to better describe and summarise subtle physical events in complex scenes. Specif-ically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our frame-work on two crowded public surveillance datasets. 1.