Results 11 - 20
of
153
A MULTI-LEVEL MIXTURE-OF-EXPERTS FRAMEWORK FOR PEDESTRIAN CLASSIFICATION
, 2011
"... Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape cues based on Chamfer shape matching provide sample-dependent priors for a certain pedestrian view. On modality-level, we represent each data sample in terms of image intensity, (dense) depth and (dense) flow. On feature-level, we consider histograms of oriented gradients (HOG) and local binary patterns (LBP). Multilayer perceptrons (MLP) and linear support vector machines (linSVM) are used as expert classifiers. Experiments are performed on a unique real-world multimodality dataset captured from a moving vehicle in urban traffic. This dataset has been made public for research purposes. Our results show a significant performance boost of up to a factor of 42 in reduction of false positives at constant detection rates of our approach compared to a baseline intensity-only HOG/linSVM approach.
Robust multi-resolution pedestrian detection in traffic scenes
- In CVPR
, 2013
"... The serious performance decline with decreasing resolu-tion is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian de-tection in different resolutions as different but related prob-lems, and propose a Multi-Task model to jointly consider their ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
(Show Context)
The serious performance decline with decreasing resolu-tion is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian de-tection in different resolutions as different but related prob-lems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains res-olution aware transformations to map pedestrians in differ-ent resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent proce-dure to learn the resolution aware transformations and de-formable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to sup-press them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method re-duces the mean miss rate to 60 % for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which no-ticeably outperforms previous state-of-the-art (71%). 1.
S.M.: Poseshop: human image database construction and personalized content synthesis
- IEEE Trans. Vis. Comput. Graph
, 2013
"... Abstract—We present PoseShop—a pipeline to construct segmented human image database with minimal manual intervention. By downloading, analyzing, and filtering massive amounts of human images from the Internet, we achieve a database which contains 400 thousands human figures that are segmented out of ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
(Show Context)
Abstract—We present PoseShop—a pipeline to construct segmented human image database with minimal manual intervention. By downloading, analyzing, and filtering massive amounts of human images from the Internet, we achieve a database which contains 400 thousands human figures that are segmented out of their background. The human figures are organized based on action semantic, clothes attributes, and indexed by the shape of their poses. They can be queried using either silhouette sketch or a skeleton to find a given pose. We demonstrate applications for this database for multiframe personalized content synthesis in the form of comic-strips, where the main character is the user or his/her friends. We address the two challenges of such synthesis, namely personalization and consistency over a set of frames, by introducing head swapping and clothes swapping techniques. We also demonstrate an action correlation analysis application to show the usefulness of the database for vision application. Index Terms—Image database, image composition Ç 1
MEVBench: A Mobile Computer Vision Benchmarking Suite
- In Proceedings of the IEEE International Symposium on Workload Characterization
, 2011
"... The growth in mobile vision applications, coupled with the performance limitations of mobile platforms, has led to a growing need to understand computer vision applications. Computationally intensive mobile vision applications, such as augmented reality or object recognition, place significant perfo ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
The growth in mobile vision applications, coupled with the performance limitations of mobile platforms, has led to a growing need to understand computer vision applications. Computationally intensive mobile vision applications, such as augmented reality or object recognition, place significant performance and power demands on existing embedded platforms, often leading to degraded application quality. With a better understanding of this growing application space, it will be possible to more effectively optimize future embedded platforms. In this work, we introduce and evaluate a custom benchmark suite for mobile embedded vision applications named MEVBench. MEVBench provides a wide range of mobile vision applications such as face detection, feature classification, object tracking and feature extraction. To better understand mobile vision processing characteristics at the architectural level, we analyze single and multithread implementations of many algorithms to evaluate performance, scalability, and memory characteristics. We provide insights into the major areas where architecture can improve the performance of these applications in embedded systems. large growth in vision applications as mobile devices such as tablets and smartphones gain more capable imaging devices. This, coupled with the proliferation of smartphones and tablets, is leading to mobile computer vision becoming a key application domain in embedded computing. Figure 1. Augmented Reality The figure shows an example of augmented reality available on mobile platforms. The left image shows the original scene. In the right image a red cube frame has been rendered in proper perspective as though attached to the marker. Current mobile computing devices are capable of rendering detailed objects into the scene. 1
The benefits of dense stereo for pedestrian detection
- IEEE Transactions on Intelligent Transportation Systems
, 2011
"... Abstract—This paper presents a novel pedestrian detection sys-tem for intelligent vehicles. We propose the use of dense stereo for both the generation of regions of interest and pedestrian classi-fication. Dense stereo allows the dynamic estimation of camera parameters and the road profile, which, i ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Abstract—This paper presents a novel pedestrian detection sys-tem for intelligent vehicles. We propose the use of dense stereo for both the generation of regions of interest and pedestrian classi-fication. Dense stereo allows the dynamic estimation of camera parameters and the road profile, which, in turn, provides strong scene constraints on possible pedestrian locations. For classifica-tion, we extract spatial features (gradient orientation histograms) directly from dense depth and intensity images. Both modalities are represented in terms of individual feature spaces, in which discriminative classifiers (linear support vector machines) are learned. We refrain from the construction of a joint feature space but instead employ a fusion of depth and intensity on the classifier level. Our experiments involve challenging image data captured in complex urban environments (i.e., undulating roads and speed bumps). Our results show a performance improvement by up to a factor of 7.5 at the classification level and up to a factor of 5 at the tracking level (reduction in false alarms at constant detection rates) over a system with static scene constraints and intensity-only classification. Index Terms—Active safety, computer vision, intelligent vehi-cles, pedestrian detection. I.
High-level fusion of depth and intensity for pedestrian classification
- In Proc. DAGM
, 2009
"... Abstract. This paper presents a novel approach to pedestrian classification which involves a high-level fusion of depth and intensity cues. Instead of utilizing depth information only in a pre-processing step, we propose to extract discriminative spatial features (gradient orientation histograms and ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a novel approach to pedestrian classification which involves a high-level fusion of depth and intensity cues. Instead of utilizing depth information only in a pre-processing step, we propose to extract discriminative spatial features (gradient orientation histograms and local receptive fields) directly from (dense) depth and intensity images. Both modalities are represented in terms of individual feature spaces, in each of which a discriminative model is learned to distinguish between pedestrians and non-pedestrians. We refrain from the construction of a joint feature space, but instead employ a high-level fusion of depth and intensity at classifier-level. Our experiments on a large real-world dataset demonstrate a significant performance improvement of the combined intensity-depth representation over depth-only and intensity-only models (factor four reduction in false positives at comparable detection rates). Moreover, high-level fusion outperforms low-level fusion using a joint feature space approach. 1
A New Benchmark for Stereo-Based Pedestrian Detection
"... Abstract — Pedestrian detection is a rapidly evolving area in the intelligent vehicles domain. Stereo vision is an attractive sensor for this purpose. But unlike for monocular vision, there are no realistic, large scale benchmarks available for stereobased pedestrian detection, to provide a common p ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Pedestrian detection is a rapidly evolving area in the intelligent vehicles domain. Stereo vision is an attractive sensor for this purpose. But unlike for monocular vision, there are no realistic, large scale benchmarks available for stereobased pedestrian detection, to provide a common point of reference for evaluation. This paper introduces the Daimler Stereo-Vision Pedestrian Detection benchmark, which consists of several thousands of pedestrians in the training set, and a 27-min test drive through urban environment and associated vehicle data. The data, including ground truth, is made publicly available for non-commercial purposes. The paper furthermore quantifies the benefit of stereo vision for ROI generation and localization; at equal detection rates, false positives are reduced by a factor of 4-5 with stereo over mono, using the same HOG/linSVM classification component. I.
Article Error Analysis in a Stereo Vision-Based Pedestrian Detection Sensor for Collision Avoidance Applications
, 2010
"... sensors ..."
(Show Context)
Ten years of pedestrian detection, what have we learned
- In ECCV Workshops
, 2014
"... Abstract Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by dis-cussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Abstract Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by dis-cussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detec-tion quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance
Dense Stereo-based ROI Generation for Pedestrian Detection
"... Abstract. This paper investigates the benefit of dense stereo for the ROI generation stage of a pedestrian detection system. Dense disparity maps allow an accurate estimation of the camera height, pitch angle and vertical road profile, which in turn enables a more precise specification of the areas ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Abstract. This paper investigates the benefit of dense stereo for the ROI generation stage of a pedestrian detection system. Dense disparity maps allow an accurate estimation of the camera height, pitch angle and vertical road profile, which in turn enables a more precise specification of the areas on the ground where pedestrians are to be expected. An experimental comparison between sparse and dense stereo approaches is carried out on image data captured in complex urban environments (i.e. undulating roads, speed bumps). The ROI generation stage, based on dense stereo and specific camera and road parameter estimation, results in a detection performance improvement of factor five over the stateof-the-art based on ROI generation by sparse stereo. Interestingly, the added processing cost of computing dense disparity maps is at least partially amortized by the fewer ROIs that need to be processed at the system level. 1