Results 1 -
9 of
9
Sliding Shapes for 3D Object Detection in Depth Images
"... Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self-occlusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a
Virtual View Networks for Object Reconstruction
"... All that structure from motion algorithms “see ” are sets of 2D points. We show that these impoverished views of the world can be faked for the purpose of reconstructing objects in challenging settings, such as from a single im-age, or from a few ones far apart, by recognizing the object and getting ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
All that structure from motion algorithms “see ” are sets of 2D points. We show that these impoverished views of the world can be faked for the purpose of reconstructing objects in challenging settings, such as from a single im-age, or from a few ones far apart, by recognizing the object and getting help from a collection of images of other objects from the same class. We synthesize virtual views by com-puting geodesics on novel networks connecting objects with similar viewpoints, and introduce techniques to increase the specificity and robustness of factorization-based object re-construction in this setting. We report accurate object shape reconstruction from a single image on challenging PASCAL VOC data, which suggests that the current domain of appli-cations of rigid structure-from-motion techniques may be significantly extended. 1.
Learning Features and Parts for Fine-Grained Recognition (Invited Paper)
"... Abstract—This paper addresses the problem of fine-grained recognition: recognizing subordinate categories such as bird species, car models, or dog breeds. We focus on two major challenges: learning expressive appearance descriptors and lo-calizing discriminative parts. To this end, we propose an obj ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—This paper addresses the problem of fine-grained recognition: recognizing subordinate categories such as bird species, car models, or dog breeds. We focus on two major challenges: learning expressive appearance descriptors and lo-calizing discriminative parts. To this end, we propose an object representation that detects important parts and describes fine-grained appearances. The part detectors are learned in a fully unsupervised manner, based on the insight that images with similar poses can be automatically discovered for fine-grained classes in the same domain. The appearance descriptors are learned using a convolutional neural network. Our approach requires only image level class labels, without any use of part annotations or segmentation masks, which may be costly to obtain. We show experimentally that combining these two insights is an effective strategy for fine-grained recognition. I.
Car Make and Model Recognition using 3D Curve Alignment
"... We present a new approach for recognizing the make and model of a car from a single image. While most pre-vious methods are restricted to fixed or limited viewpoints, our system is able to verify a car’s make and model from an arbitrary view. Our model consists of 3D space curves obtained by backpro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present a new approach for recognizing the make and model of a car from a single image. While most pre-vious methods are restricted to fixed or limited viewpoints, our system is able to verify a car’s make and model from an arbitrary view. Our model consists of 3D space curves obtained by backprojecting image curves onto silhouette-based visual hulls and then refining them using three-view curve matching. These 3D curves are then matched to 2D image curves using a 3D view-based alignment technique. We present two different methods for estimating the pose of a car, which we then use to initialize the 3D curve matching. Our approach is able to verify the exact make and model of a car over a wide range of viewpoints in cluttered scenes. 1.
A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition
"... Despite the fact that object detection, 3D pose estima-tion, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hier-archical repr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Despite the fact that object detection, 3D pose estima-tion, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hier-archical representation, where each level of the hierarchy represents objects at a different level of granularity. The hi-erarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint model-ing enables resolving ambiguities that exist in independent modeling of these tasks. We augment PASCAL3D+ [34] dataset with annotations for these tasks and show that our hierarchical model is effective in joint modeling of object detection, 3D pose estimation, and sub-category recogni-tion. 1.
Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification
"... Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recogni-tion. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more chal-lenging because (i) fine-grained labeled data is much more expensive to acquire ..."
Abstract
- Add to MetaCart
(Show Context)
Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recogni-tion. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more chal-lenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class vari-ance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strat-egy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the chal-lenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled im-ages from readily available external sources (e.g., image search engines), and formulating the problem into multi-task learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected. 1.
ShapeNet: An Information-Rich 3D Model Repository
"... Authors listed alphabetically We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of ob-jects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the Word-Net taxonomy. It is a collection of datasets providi ..."
Abstract
- Add to MetaCart
(Show Context)
Authors listed alphabetically We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of ob-jects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the Word-Net taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consis-tent rigid alignments, parts and bilateral symmetry planes,
Fine-Grained Recognition without Part Annotations: Supplementary Material
"... In the main text we showed that large gains from using a VGGNet [5] architecture on the CUB-2011 [6] dataset. We show a similar comparison on the cars-196 [3] dataset in Tab. 1. As before, using a VGGNet architecture leads to large gains. Particularly striking is the gain from fine-tuning a VGGNet o ..."
Abstract
- Add to MetaCart
(Show Context)
In the main text we showed that large gains from using a VGGNet [5] architecture on the CUB-2011 [6] dataset. We show a similar comparison on the cars-196 [3] dataset in Tab. 1. As before, using a VGGNet architecture leads to large gains. Particularly striking is the gain from fine-tuning a VGGNet on cars-196 – a basic R-CNN goes from 57.4% to 88.4 % accuracy only by fine-tuning, much larger than the already sizeable gain from fine-tuning a CaffeNet [2]. 2. Additional Visualizations The visualizations in this section are expanded versions of figures from the main text. 2.1. Pose Nearest Neighbors In Fig. 1 we show more examples of nearest neighbors using conv4 features, which is our heuristic for measuring the difference in pose between different images (cf. Fig. 4
Spatial, Temporal and Spatio-Temporal Correspondence for Computer Vision Problems
, 2014
"... Many computer vision problems, such as object classification, motion estimation or shape registration rely on solving the correspondence problem. Existing al-gorithms to solve spatial or temporal correspondence problems are usually NP-hard, difficult to approximate, lack flexible models and mechanis ..."
Abstract
- Add to MetaCart
Many computer vision problems, such as object classification, motion estimation or shape registration rely on solving the correspondence problem. Existing al-gorithms to solve spatial or temporal correspondence problems are usually NP-hard, difficult to approximate, lack flexible models and mechanism for feature weighting. This proposal addresses the correspondence problem in computer vision, and proposes two new spatio-temporal correspondence problems and three algorithms to solve spatial, temporal and spatio-temporal matching between video and other sources. The main contributions of the thesis are: (1) Factorial graph matching (FGM). FGM extends existing work on graph match-ing (GM) by finding an exact factorization of the affinity matrix. Four are the benefits that follow from this factorization: (a) There is no need to compute the costly (in space and time) pairwise affinity matrix; (b) It provides a unified framework that reveals commonalities and differences between GM methods. Moreover, the factorization provides a clean connection with other matching algorithms such as