Results 1 - 10
of
31
HOGgles: Visualizing Object Detection Features ∗
"... We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new wa ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
(Show Context)
We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems. Figure 1: An image from PASCAL and a high scoring car detection from DPM [8]. Why did the detector fail? 1.
Parzen Discriminant Analysis
"... In this paper, we propose a non-parametric Discriminant Analysis method (no assumption on the distributions of classes), called Parzen Discriminant Analysis (PDA). Through a deep investigation on the non-parametric density estimation, we find that minimizing/maximizing the distances between each dat ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a non-parametric Discriminant Analysis method (no assumption on the distributions of classes), called Parzen Discriminant Analysis (PDA). Through a deep investigation on the non-parametric density estimation, we find that minimizing/maximizing the distances between each data sample and its nearby similar/dissimilar samples is equivalent to minimizing an upper bound of the Bayesian error rate. Based on this theoretical analysis, we define our criterion as maximizing the average local dissimilarity scatter with respect to a fixed average local similarity scatter. All local scatters are calculated in fixed size local regions, resembling the idea of Parzen estimation. Experiments in UCI machine learning database show that our method impressively outperforms other related neighbor based non-parametric methods. 1.
Fast direct super-resolution by simple functions
- In ICCV
"... The goal of single-image super-resolution is to gener-ate a high-quality high-resolution image based on a given low-resolution input. It is an ill-posed problem which re-quires exemplars or priors to better reconstruct the missing high-resolution image details. In this paper, we propose to split the ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
The goal of single-image super-resolution is to gener-ate a high-quality high-resolution image based on a given low-resolution input. It is an ill-posed problem which re-quires exemplars or priors to better reconstruct the missing high-resolution image details. In this paper, we propose to split the feature space into numerous subspaces and col-lect exemplars to learn priors for each subspace, thereby creating effective mapping functions. The use of split in-put space facilitates both feasibility of using simple func-tions for super-resolution, and efficiency of generating high-resolution results. High-quality high-resolution images are reconstructed based on the effective learned priors. Experi-mental results demonstrate that the proposed algorithm per-forms efficiently and effectively over state-of-the-art meth-ods. 1.
R.: Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution
- In: CVPR (2013
"... This paper addresses the problem of learning over-complete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship be-tween the two spaces. A Bayesian method using a beta pro-cess prior is applied to learn the over-complete dictionar-ies. Compared to ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of learning over-complete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship be-tween the two spaces. A Bayesian method using a beta pro-cess prior is applied to learn the over-complete dictionar-ies. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dic-tionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative im-portance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods by applying this method to single image super-resolution. The experimental results show that dic-tionaries learned by our method produces the best super-resolution results compared to other state-of-the-art meth-ods. 1.
Inverting and visualizing features for object detection. arXiv
, 2012
"... This paper presents methods to visualize feature spaces commonly used in object detection. The tools in this paper allow a human to put on “feature space glasses ” and see the visual world as a computer might see it. We found that these “glasses ” allow us to gain insight into the behavior of comput ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
This paper presents methods to visualize feature spaces commonly used in object detection. The tools in this paper allow a human to put on “feature space glasses ” and see the visual world as a computer might see it. We found that these “glasses ” allow us to gain insight into the behavior of computer vision systems. We show a variety of experiments with our visualizations, such as examining the linear separability of recognition in HOG space, generating high scoring “super objects ” for an object detector, and diagnosing false positives. We pose the visualization problem as one of feature inversion, i.e. recovering the natural image that generated a feature descriptor. We describe four algorithms to tackle this task, with different trade-offs in speed, accuracy, and scalability. Our most successful algorithm uses ideas from sparse coding to learn a pair of dictionaries that enable regression between HOG features and natural images, and can invert features at interactive rates. We believe these visualizations are useful tools to add to an object detector researcher’s toolbox, and code is available. 1.
Y.C.F.: Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition
- In: ICCV. (2013
"... Cross-domain image synthesis and recognition are typi-cally considered as two distinct tasks in the areas of com-puter vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be eas-ily generalized or extended for solving the other. In this paper, we pro ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Cross-domain image synthesis and recognition are typi-cally considered as two distinct tasks in the areas of com-puter vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be eas-ily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space learning. The proposed learning model not only observes a common feature space for associating cross-domain image data for recognition purposes, the de-rived feature space is able to jointly update the dictionaries in each image domain for improved representation. This is why our method can be applied to both cross-domain image synthesis and recognition problems. Experiments on a vari-ety of synthesis and recognition tasks such as single image super-resolution, cross-view action recognition, and sketch-to-photo face recognition would verify the effectiveness of our proposed learning model. 1.
1A Statistical Prediction Model Based on Sparse Representations for Single Image Super-Resolution
"... We address single image super-resolution using a statistical prediction model based on sparse representations of low and high resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Predic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
We address single image super-resolution using a statistical prediction model based on sparse representations of low and high resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Prediction of high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful interpretation of a feedforward neural network. To further enhance performance we suggest data clustering and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a low and high resolution dictionary pair, in terms of computational complexity, numerical criteria and visual appearance. The suggested approach offers a desirable compromise between low computational complexity and reconstruction quality, when comparing it with state-of-the-art methods for single image super-resolution. Index Terms Dictionary learning, feedforward neural networks, MMSE estimation, nonlinear prediction, single image super-resolution, sparse representations, statistical models, restricted Boltzmann machine, zooming deblurring I.
Accurate Blur Models vs. Image Priors in Single Image Super-Resolution
"... Over the past decade, single image Super-Resolution (SR) research has focused on developing sophisticated image priors, leading to significant advances. Estimating and incorporating the blur model, that relates the high-res and low-res images, has received much less attention, however. In particular ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Over the past decade, single image Super-Resolution (SR) research has focused on developing sophisticated image priors, leading to significant advances. Estimating and incorporating the blur model, that relates the high-res and low-res images, has received much less attention, however. In particular, the reconstruction constraint, namely that the blurred and downsampled high-res output should approximately equal the low-res input image, has been either ignored or applied with default fixed blur models. In this work, we examine the relative importance of the image prior and the reconstruction constraint. First, we show that an accurate reconstruction constraint combined with a simple gradient regularization achieves SR results almost as good as those of state-of-the-art algorithms with sophisticated image priors. Second, we study both empirically and theoretically the sensitivity of SR algorithms to the blur model assumed in the reconstruction constraint. We find that an accurate blur model is more important than a sophisticated image prior. Finally, using real camera data, we demonstrate that the default blur models of various SR algorithms may differ from the camera blur, typically leading to oversmoothed results. Our findings highlight the importance of accurately estimating camera blur in reconstructing raw low- res images acquired by an actual camera. 1.
Deeply coupled auto-encoder networks for cross-view classification. arXiv preprint arXiv:1402.2031
, 2014
"... The comparison of heterogeneous samples extensively exists in many applications, especially in the task of im-age classification. In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
The comparison of heterogeneous samples extensively exists in many applications, especially in the task of im-age classification. In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, coupled with each other in ev-ery corresponding layers. In DCAN, each deep structure is developed via stacking multiple discriminative cou-pled auto-encoders, a denoising auto-encoder trained with maximum margin criterion consisting of intra-class com-pactness and inter-class penalty. This single layer com-ponent makes our model simultaneously preserve the lo-cal consistency and enhance its discriminative capabil-ity. With increasing number of layers, the coupled net-works can gradually narrow the gap between the two views. Extensive experiments on cross-view image clas-sification tasks demonstrate the superiority of our method over state-of-the-art methods. 1
Visualizing Object Detection Features
, 2013
"... We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new wa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.