Results 1 - 10
of
150
Rapid object detection using a boosted cascade of simple features
- ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001
, 2001
"... This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "Inte ..."
Abstract
-
Cited by 1371 (6 self)
- Add to MetaCart
This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers[6]. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
A Model of Saliency-based Visual Attention for Rapid Scene Analysis
, 1998
"... A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing salie ..."
Abstract
-
Cited by 688 (50 self)
- Add to MetaCart
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Robust Real-time Object Detection
- International Journal of Computer Vision
, 2001
"... This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image ” which allows the features ..."
Abstract
-
Cited by 570 (4 self)
- Add to MetaCart
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image ” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers [6]. The third contribution is a method for combining classifiers in a “cascade ” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems [18, 13, 16, 12, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second. 1.
Contextual Priming for Object Detection
- IJCV
, 2003
"... There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework f ..."
Abstract
-
Cited by 132 (16 self)
- Add to MetaCart
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Spatiotemporal Sensitivity and Visual Attention for Efficient Rendering of Dynamic Environments
, 2001
"... INTRODUCTION Global illumination is the physically accurate calculation of lighting in an environment. It is computationally expensive for static environments and even more so for dynamic environments. Not only are many images required for an animation, but the calculation involved increases with th ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
INTRODUCTION Global illumination is the physically accurate calculation of lighting in an environment. It is computationally expensive for static environments and even more so for dynamic environments. Not only are many images required for an animation, but the calculation involved increases with the presence of moving objects. In static environments, global illumination algorithms can precompute a lighting solution and reuse it whenever the viewpoint changes, but in dynamic environments, any moving object or light potentially affects the illumination of every other object in a scene. To guarantee accuracy, the algorithm has to recompute the entire lighting solution for each frame. This paper describes a perceptually-based technique that can dramatically reduce this computational load. The technique may also be used in image based rendering, geometry level of detail selection, realistic image synthesis, video telephony and video compression. Perceptually-based rendering operat
A principled approach to detecting surprising events in video
- in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2005
"... Primates demonstrate unparalleled ability at rapidly orienting towards important events in complex dynamic environments. During rapid guidance of attention and gaze towards potential objects of interest or threats, often there is no time for detailed visual analysis. Thus, heuristic computations are ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Primates demonstrate unparalleled ability at rapidly orienting towards important events in complex dynamic environments. During rapid guidance of attention and gaze towards potential objects of interest or threats, often there is no time for detailed visual analysis. Thus, heuristic computations are necessary to locate the most interesting events in quasi real-time. We present a new theory of sensory surprise, which provides a principled and computable shortcut to important information. We develop a model that computes instantaneous low-level surprise at every location in video streams. The algorithm significantly correlates with eye movements of two humans watching complex video clips, including television programs (17,936 frames, 2,152 saccadic gaze shifts). The system allows more sophisticated and time-consuming image analysis to be efficiently focused onto the most surprising subsets of the incoming data. 1.
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
- PSYCHOLOGICAL REVIEW
, 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
Modeling Global Scene Factors in Attention
- JOSA - A
, 2003
"... this paper a statistical framework for incorporating contextual information in the search task is proposed ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
this paper a statistical framework for incorporating contextual information in the search task is proposed
Animat Vision: Active Vision in Artificial Animals
, 1997
"... this paper [4] and it is further developed in [5]. ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
this paper [4] and it is further developed in [5].

