Results 1 -
4 of
4
Matching-cnn meets knn: Quasiparametric human parsing
- CVPR
"... Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into sev-eral semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advan-tages of both methodologi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into sev-eral semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advan-tages of both methodologies, namely supervision from an-notated data and the flexibility to use newly annotated (pos-sibly uncommon) images, and present a quasi-parametric human parsing model. Under the classic K Nearest Neigh-bor (KNN)-based nonparametric framework, the paramet-ric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displace-ments of the best matched region in the testing image for a particular semantic region in one KNN image. Given a testing image, we first retrieve its KNN images from the annotated/manually-parsed human image corpus. Then each semantic region in each KNN image is matched with confidence to the testing image using M-CNN, and the matched regions from all KNN images are further fused, followed by a superpixel smoothing procedure to obtain the ultimate human parsing result. The M-CNN differs from the classic CNN [12] in that the tailored cross image match-ing filters are introduced to characterize the matching be-tween the testing image and the semantic region of a KNN image. The cross image matching filters are defined at dif-ferent convolutional layers, each aiming to capture a par-ticular range of displacements. Comprehensive evaluations over a large dataset with 7,700 annotated human images well demonstrate the significant performance gain from the quasi-parametric model over the state-of-the-arts [29, 30], for the human parsing task. 1.
Joint object and part segmentation using deep learned potentials
- In ICCV, 2015a
"... Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vi-sion. In this paper, we propose a joint solution that tack-les semantic object and part segmentation simultaneously, in which highe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vi-sion. In this paper, we propose a joint solution that tack-les semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localiza-tion is utilized to refine object segmentation. Specifically, we first introduce the concept of semantic compositional parts (SCP) in which similar semantic parts are grouped and shared among different objects. A two-stream fully con-volutional network (FCN) is then trained to provide the SCP and object potentials at each pixel. At the same time, a compact set of segments can also be obtained from the SCP predictions of the network. Given the potentials and the generated segments, in order to explore long-range context, we finally construct an efficient fully connected conditional random field (FCRF) to jointly predict the final object and part labels. Extensive evaluation on three different datasets shows that our approach can mutually enhance the perfor-mance of object and part segmentation, and outperforms the current state-of-the-art on both tasks. 1.
Under review as a conference paper at ICLR 2016 ZOOM BETTER TO SEE CLEARER: HUMAN PART SEG- MENTATION WITH AUTO ZOOM NET
"... Parsing human regions into semantic parts, e.g., body, head and arms etc., from a random natural image is challenging while fundamental for computer vision and widely applicable in industry. One major difficulty to handle such a problem is the high flexibility of scale and location of a human instan ..."
Abstract
- Add to MetaCart
(Show Context)
Parsing human regions into semantic parts, e.g., body, head and arms etc., from a random natural image is challenging while fundamental for computer vision and widely applicable in industry. One major difficulty to handle such a problem is the high flexibility of scale and location of a human instance and its corresponding parts, making the parsing task either lack of boundary details or suffer from local confusions. To tackle such problems, in this work, we propose the “Auto-Zoom Net ” (AZN) for human part parsing, which is a unified fully convolutional neural network structure that: (1) parses each human instance into detailed parts. (2) pre-dicts the locations and scales of human instances and their corresponding parts. In our unified network, the two tasks are mutually beneficial. The score maps ob-tained for parsing help estimate the locations and scales for human instances and their parts. With the predicted locations and scales, our model “zooms ” the region into a right scale to further refine the parsing. In practice, we perform the two tasks iteratively so that detailed human parts are gradually recovered. We conduct ex-tensive experiments over the challenging PASCAL-Person-Part segmentation, and show our approach significantly outperforms the state-of-art parsing techniques especially for instances and parts at small scale. 1
Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes
"... We address the problem of describing people based on fine-grained clothing attributes. This is an important prob-lem for many practical applications, such as identifying target suspects or finding missing people based on de-tailed clothing descriptions in surveillance videos or con-sumer photos. We ..."
Abstract
- Add to MetaCart
(Show Context)
We address the problem of describing people based on fine-grained clothing attributes. This is an important prob-lem for many practical applications, such as identifying target suspects or finding missing people based on de-tailed clothing descriptions in surveillance videos or con-sumer photos. We approach this problem by first mining clothing images with fine-grained attribute labels from on-line shopping stores. A large-scale dataset is built with about one million images and fine-detailed attribute sub-categories, such as various shades of color (e.g., water-melon red, rosy red, purplish red), clothing types (e.g., down jacket, denim jacket), and patterns (e.g., thin horizontal stripes, houndstooth). As these images are taken in ideal pose/lighting/background conditions, it is unreliable to di-rectly use them as training data for attribute prediction in the domain of unconstrained images captured, for exam-ple, by mobile phones or surveillance cameras. In order to bridge this gap, we propose a novel double-path deep do-main adaptation network to model the data from the two domains jointly. Several alignment cost layers placed in-between the two columns ensure the consistency of the two domain features and the feasibility to predict unseen at-tribute categories in one of the domains. Finally, to achieve a working system with automatic human body alignment, we trained an enhanced RCNN-based detector to localize human bodies in images. Our extensive experimental eval-uation demonstrates the effectiveness of the proposed ap-proach for describing people based on fine-grained clothing attributes. 1.