Results 1 - 10
of
37
Building the gist of a scene: the role of global image features in recognition
- Progress in Brain Research
, 2006
"... frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene gist understanding, based on scene-centered, rather than objectcentered primitives. We show that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Global features are based on configurations of spatial scales and are estimated without invoking segmentation or grouping operations. The scene-centered approach is not an alternative to local image analysis but would serve as a feed-forward and parallel pathway of visual processing, able to quickly constrain local feature analysis and enhance object recognition in cluttered natural scenes. 1
Top-Down Control of Visual Attention in Object Detection
, 2003
"... Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a m ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
- PSYCHOLOGICAL REVIEW
, 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
Saliency detection: A spectral residual approach
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR07). IEEE Computer Society
, 2007
"... The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method. 1.
Manipulation in human environments
- in Int’l Conf Humanoid Robots. IEEE
, 2006
"... Abstract — Robots that work alongside us in our homes and workplaces could extend the time an elderly person can live at home, provide physical assistance to a worker on an assembly line, or help with household chores. In order to assist us in these ways, robots will need to successfully perform man ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Abstract — Robots that work alongside us in our homes and workplaces could extend the time an elderly person can live at home, provide physical assistance to a worker on an assembly line, or help with household chores. In order to assist us in these ways, robots will need to successfully perform manipulation tasks within human environments. Human environments present special challenges for robot manipulation since they are complex, dynamic, uncontrolled, and difficult to perceive reliably. In this paper we present a behavior-based control system that enables a humanoid robot, Domo, to help a person place objects on a shelf. Domo is able to physically locate the shelf, socially cue a person to hand it an object, grasp the object that has been handed to it, transfer the object to the hand that is closest to the shelf, and place the object on the shelf. We use this behavior-based control system to illustrate three themes that characterize our approach to manipulation in human environments. The first theme, cooperative manipulation, refers to the advantages that can be gained by having the robot work with a person to cooperatively perform manipulation tasks. The second theme, task relevant features, emphasizes the benefits of carefully selecting the aspects of the world that are to be perceived and acted upon during a manipulation task. The third theme, let the body do the thinking, encompasses several ways in which a robot can use its body to simplify manipulation tasks. 1 Fig. 1. The humanoid robot Domo used in this paper. I.
Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes
- Visual Cognition
, 2005
"... in dynamic scenes ..."
Quantitative modelling of perceptual salience at human eye position
- Visual Cognition
, 2006
"... We investigate the extent to which a simple model of bottom-up attention and salience may be embedded within a broader computational framework, and compared with human eye movement data. We focus on quantifying whether increased simulation realism significantly affects quantitative measures of how w ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We investigate the extent to which a simple model of bottom-up attention and salience may be embedded within a broader computational framework, and compared with human eye movement data. We focus on quantifying whether increased simulation realism significantly affects quantitative measures of how well the model may predict where in video clips humans direct their gaze. We hence compare three variants of the model, tested with 15 video clips of natural scenes shown to three observers. We measure model-predicted salience at the locations gazed to by the observers, compared to random locations. The first variant simply processes the raw video clips. The second adds a gaze-contingent foveation filter. The third further attempts to realistically simulate dynamic human vision by embedding the video frames within a larger background, and shifting them to eye position. Our main finding is that increasing simulation realism significantly improves the predictive ability of the model. Better emulating the details of how a visual stimulus is captured by a constantly rotating retina during active vision has a significant positive impact onto quantitative comparisons between model and
Visual causes versus correlates of attentional selection in dynamic scenes
, 2006
"... What are the visual causes, rather than mere correlates, of attentional selection and how do they compare to each other during natural vision? To address these questions, we first strung together semantically unrelated dynamic scenes into MTV-style video clips, and performed eye tracking experiments ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
What are the visual causes, rather than mere correlates, of attentional selection and how do they compare to each other during natural vision? To address these questions, we first strung together semantically unrelated dynamic scenes into MTV-style video clips, and performed eye tracking experiments with human observers. We then quantified predictions of saccade target selection based on seven bottom-up models, including intensity variance, orientation contrast, intensity contrast, color contrast, flicker contrast, motion contrast, and integrated saliency. On average, all tested models predicted saccade target selection well above chance. Dynamic models were particularly predictive of saccades that were most likely bottom-up driven-initiated shortly after scene onsets, leading to maximal interobserver similarity. Static models showed mixed results in these circumstances, with intensity variance and orientation contrast featuring particularly weak prediction accuracy (lower than their own average, and approximately 4 times lower than dynamic models). These results indicate that dynamic visual cues play a dominant causal role in attracting attention. In comparison, some static visual cues play a weaker causal role, while other static cues are not causal at all, and may instead reflect top-down causes.
Modeling search for people in 900 scenes: A combined source model of eye guidance
- Visual Cognition
, 2009
"... How predictable are human eye movements during search in real world scenes? We recorded 14 observers ’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from t ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
How predictable are human eye movements during search in real world scenes? We recorded 14 observers ’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94 % of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in Please address all correspondence to Aude Oliva, Department of Brain and Cognitive

