• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Detecting pedestrians using patterns of motion and appearance (2003)

by P Viola, M Jones, D Snow
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 575
Next 10 →

Histograms of Oriented Gradients for Human Detection

by Navneet Dalal, Bill Triggs - In CVPR , 2005
"... We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly out ..."
Abstract - Cited by 3735 (9 self) - Add to MetaCart
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds. 1
(Show Context)

Citation Context

...ure sets for human detection, showing that locally normalized Histogram of Oriented Gradient (HOG) descriptors provide excellent performance relative to other existing feature sets including wavelets =-=[17,22]-=-. The proposed descriptors are reminiscent of edge orientation histograms [4,5], SIFT descriptors [12] and shape contexts [1], but they are computed on a dense grid of uniformly spaced cells and they ...

Object Tracking: A Survey

by Alper Yilmaz, Omar Javed, Mubarak Shah , 2006
"... The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns o ..."
Abstract - Cited by 701 (7 self) - Add to MetaCart
The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

One-shot learning of object categories

by Li Fei-fei, Rob Fergus, Pietro Perona - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2006
"... Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advant ..."
Abstract - Cited by 364 (20 self) - Add to MetaCart
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
(Show Context)

Citation Context

...rning: How could we estimate models of categories from very few, one in the limit, training examples? Most researchers have focused on special-interest categories: human faces [34], [36], pedestrians =-=[37]-=-, handwritten digits [24], and automobiles [34], [13]. Instead, we wish to develop techniques that apply equally well to any category that a human would readily recognize. With this objective in mind,...

Human detection using oriented histograms of flow and appearance

by Navneet Dalal, Bill Triggs, Cordelia Schmid - In ECCV , 2006
"... Abstract. Detecting humans in films and videos is a challenging problem owing to the motion of the subjects, the camera and the background and to variations in pose, appearance, clothing, illumination and background clutter. We develop a detector for standing and moving people in videos with possibl ..."
Abstract - Cited by 283 (20 self) - Add to MetaCart
Abstract. Detecting humans in films and videos is a challenging problem owing to the motion of the subjects, the camera and the background and to variations in pose, appearance, clothing, illumination and background clutter. We develop a detector for standing and moving people in videos with possibly moving cameras and backgrounds, testing several different motion coding schemes and showing empirically that orientated histograms of differential optical flow give the best overall performance. These motion-based descriptors are combined with our Histogram of Oriented Gradient appearance descriptors. The resulting detector is tested on several databases including a challenging test set taken from feature films and containing wide ranges of pose, motion and background variations, including moving cameras and backgrounds. We validate our results on two challenging test sets containing more than 4400 human examples. The combined detector reduces the false alarm rate by a factor of 10 relative to the best appearance-based detector, for example giving false alarm rates of 1 per 20,000 windows tested at 8 % miss rate on our Test Set 1. 1
(Show Context)

Citation Context

...e camera and the background are essentially static. This greatly simplifies the problem because the mere presence of motion already provides a strong cue for human presence. For example, Viola et al. =-=[23]-=- find that including motion features markedly increases the overall performance of their system, but they assume a fixed surveillance camera viewing a largely static scene. In our case, we wanted a de...

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

Geometric context from a single image.

by Derek Hoiem , Alexei A Efros , Martial Hebert - In Proc. Int. Conf. on Computer Vision. , 2005
"... ..."
Abstract - Cited by 250 (36 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...a global recognition gestalt. In contrast, most existing computer vision systems attempt to recognize objects using local information alone. For example, currently popular object detection algorithms =-=[26, 32, 33]-=- assume that all relevant information about an object is contained within a small window in the image plane (objects are found by exhaustively scanning over all locations and scales). Note that typica...

Recovering human body configurations: Combining segmentation and recognition

by Greg Mori, Xiaofeng Ren, Alexei A. Efros, Jitendra Malik - In CVPR , 2004
"... localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this p ..."
Abstract - Cited by 215 (8 self) - Add to MetaCart
localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this problem in a general setting. The dataset we use is a collection of sports news photographs of baseball players, varying dramatically in pose and clothing. The approach that we take is to use segmentation to guide our recognition algorithm to salient bits of the image. We use this segmentation approach to build limb and torso detectors, the outputs of which are assembled into human figures. We present quantitative results on torso localization, in addition to shortlisted full body configurations. 1.
(Show Context)

Citation Context

... large number of parameters in their models lead to difficult tracking problems in high dimensional spaces. More recent developments in pedestrian detection, such as Mohan et al. [7] and Viola et al. =-=[18]-=-, are fairly successful in detecting people in common standing poses. However, these template-based windowscanning approaches do not localize joint positions, and it is not clear whether they generali...

Pictorial structures revisited: People detection and articulated pose estimation

by Mykhaylo Andriluka, Stefan Roth, Bernt Schiele - In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009 , 2009
"... Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper sh ..."
Abstract - Cited by 211 (17 self) - Add to MetaCart
Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary, and proposes a generic approach based on the pictorial structures framework. We show that the right selection of components for both appearance and spatial modeling is crucial for general applicability and overall performance of the model. The appearance of body parts is modeled using densely sampled shape context descriptors and discriminatively trained AdaBoost classifiers. Furthermore, we interpret the normalized margin of each classifier as likelihood in a generative model. Non-Gaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between parts. The marginal posterior of each part is inferred using belief propagation. We demonstrate that such a model is equally suitable for both detection and pose estimation tasks, outperforming the state of the art on three recently proposed datasets. 1. Introduction and Related
(Show Context)

Citation Context

... model [4, 6, 15], which is a powerful and general, yet simple generative body model that allows for exact and efficient inference of the part constellations. We also build upon strong part detectors =-=[1, 13, 24]-=-, which have shown to enable object and people detection in challenging scenes, but have not yet proven to enable state-of-the-art articulated pose estimation. While previous work has either focused o...

Pedestrian Detection: An Evaluation of the State of the Art

by Piotr Dollár, Christian Wojek, Bernt Schiele, Pietro Perona - SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
"... Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying e ..."
Abstract - Cited by 174 (10 self) - Add to MetaCart
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: (1) we put together a large, well-annotated and realistic monocular pedestrian detection dataset and study the statistics of the size, position and occlusion patterns of pedestrians in urban scenes, (2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and (3) we evaluate the performance of sixteen pre-trained state-of-the-art detectors across six datasets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.

Monocular Pedestrian Detection: Survey and Experiments

by Markus Enzweiler, Dariu M. Gavrila , 2008
"... Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first ..."
Abstract - Cited by 153 (13 self) - Add to MetaCart
Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75] and combined shape-texture detection [23]. Experiments are performed on an extensive dataset captured on-board a vehicle driving through urban environment. The dataset includes many thousands of training samples as well as a 27 minute test sequence involving more than 20000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection on-board a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The dataset (8.5GB) is made public for benchmarking purposes.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University