• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search (2006)

by A Torralba, A Oliva, M S Castelhano, J M Henderson
Venue:Psychological Review
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 53
Next 10 →

Bayesian surprise attracts human attention

by Laurent Itti, et al. , 2009
"... ..."
Abstract - Cited by 78 (5 self) - Add to MetaCart
Abstract not found

Integrating Visual and Range Data for Robotic Object Detection

by Stephen Gould, Paul Baumstarck, Morgan Quigley
"... Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other sensor modalities. However, despite the great progress made down this track, the goal of high accuracy object detection for robotic platforms in cluttered real-world environments remains elusive. Instead of relying on information from the image alone, we present a method that exploits the multiple sensor modalities available on a robotic platform. In particular, our method augments a 2-d object detector with 3-d information from a depth sensor to produce a “multi-modal object detector.” We demonstrate our method on a working robotic system and evaluate its performance on a number of common household/office objects. 1

Modeling search for people in 900 scenes: A combined source model of eye guidance

by Krista A. Ehinger, Barbara Hidalgo-sotelo, Antonio Torralba, Aude Oliva - Visual Cognition , 2009
"... How predictable are human eye movements during search in real world scenes? We recorded 14 observers ’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from t ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
How predictable are human eye movements during search in real world scenes? We recorded 14 observers ’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94 % of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in Please address all correspondence to Aude Oliva, Department of Brain and Cognitive

Static and Space-time Visual Saliency Detection by Self-Resemblance

by Hae Jong Seo, Peyman Milanfar
"... We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings. Visual saliency is then computed using the said “self-resemblance ” measure. The framework results in a saliency map where each pixel (or voxel) indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. As a similarity measure, matrix cosine similarity (a generalization of cosine similarity) is employed. State of the art performance is demonstrated on commonly used human eye fixation data (static scenes [5] and dynamic scenes [16]) and some psychological patterns.

SUN: Top-down saliency using natural statistics

by Christopher Kanan, Mathew H. Tong, Lingyun Zhang, Garrison W. Cottrell
"... When people try to find particular objects in natural scenes they make extensive use of knowledge about how and where objects tend to appear in a scene. Although many forms of such “top-down ” knowledge have been incorporated into saliency map models of visual search, surprisingly, the role of objec ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
When people try to find particular objects in natural scenes they make extensive use of knowledge about how and where objects tend to appear in a scene. Although many forms of such “top-down ” knowledge have been incorporated into saliency map models of visual search, surprisingly, the role of object appearance has been infrequently investigated. Here we present an appearance based saliency model derived in a Bayesian framework. We compare our approach with both bottom-up saliency algorithms as well as the state-of-the-art Contextual Guidance model of Torralba et al. (2006) at predicting human fixations. Although both top-down approaches use very different types of information, they achieve similar performance; each substantially better than the purely bottom-up models. Our experiments reveal that a simple model of object appearance can predict human fixations quite well, even making the same mistakes as people.

Predicting human gaze using low-level saliency . . .

by Moran Cerf, Wolfgang Einhäuser, Jonathan Harel, Christof Koch
"... ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Abstract not found

Image saliency by isocentric curvedness and color

by Roberto Valenti, Nicu Sebe, Theo Gevers - In ICCV , 2009
"... In this paper we propose a novel computational method to infer visual saliency in images. The method is based on the idea that salient objects should have local characteristics that are different than the rest of the scene, being edges, color or shape. By using a novel operator, these characteristic ..."
Abstract - Cited by 9 (4 self) - Add to MetaCart
In this paper we propose a novel computational method to infer visual saliency in images. The method is based on the idea that salient objects should have local characteristics that are different than the rest of the scene, being edges, color or shape. By using a novel operator, these characteristics are combined to infer global information. The obtained information is used as a weighting for the output of a segmentation algorithm so that the salient object in the scene can easily be distinguished from the background. The proposed approach is fast and it does not require any learning. The experimentation shows that the system can enhance interesting objects in images and it is able to correctly locate the same object annotated by humans with an F-measure of 85.61 % when the object size is known, and 79.19 % when the object size is unknown, improving the state of the art performance on a public dataset. 1.

Visual Saliency Model for Robot Cameras

by Nicholas J. Butko, Lingyun Zhang, Garrison W. Cottrell, Javier R. Movellan
"... Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient th ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient their cameras in open ended conditions. One difficulty is that most current models of visual saliency are computationally very expensive and not suited to real time implementations needed for robotic applications. Here we propose a very fast approximation to a Bayesian model of visual saliency recently proposed in the literature. The approximation can run in real time on current computers at very little computational cost, leaving plenty of CPU cycles for other tasks. We empirically evaluate the potential usefulness of the visual saliency model to control saccades of a camera in social robotics situations. We found that this simple general purpose saliency model doubled the success rate of the camera: it captured images of people 70 % of the time, when compared to a 35 % success rate when the camera was controlled using an open-loop scheme. After 3 saccades (camera movements), the robot was 96 % likely to capture at least one person. The results suggest that visual saliency models may provide a useful front end for camera control in robotics applications. I.

Fixations on Low-Resolution Images

by Tilke Judd, Frédo Durand, Antonio Torralba
"... When an observer looks at an image, his eyes fixate on a few select points. Fixations from different observers are often consistent--observers tend to look at the same locations. We investigate how image resolution affects fixation locations and consistency across humans through an eye tracking expe ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
When an observer looks at an image, his eyes fixate on a few select points. Fixations from different observers are often consistent--observers tend to look at the same locations. We investigate how image resolution affects fixation locations and consistency across humans through an eye tracking experiment. We showed 168 natural images and 25 pink noise images at different resolutions to 64 observers. Each image was shown at eight resolutions (height between 4-512 pixels) and upsampled to 860x1024 pixels for display. The total amount of visual information available ranged from 1/8 to 16 cycles per degree respectively. We measure how well one observer's fixations predict another observer's fixations on the same image at different resolutions using the area under the receiver operating characteristic (ROC) curves as a metric. We found that: 1) Fixations from lower-resolution images can predict fixations on higher-resolution images. 2) Human fixations are biased towards the center for all resolutions and this bias is stronger at lower resolutions. 3) Human fixations become more consistent as resolution increases until around 16-64px (1/2 to 2 cycles per degree) after which consistency remains relatively constant despite the spread of fixations away from the center. 4) Fixation consistency depends on image complexity.

Nonparametric Bottom-Up Saliency Detection by Self-Resemblance

by Hae Jong Seo, Peyman Milanfar
"... We present a novel bottom-up saliency detection algorithm. Our method computes so-called local regression kernels (i.e., local features) from the given image, which measure the likeness of a pixel to its surroundings. Visual saliency is then computed using the said “selfresemblance” measure. The fra ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
We present a novel bottom-up saliency detection algorithm. Our method computes so-called local regression kernels (i.e., local features) from the given image, which measure the likeness of a pixel to its surroundings. Visual saliency is then computed using the said “selfresemblance” measure. The framework results in a saliency map where each pixel indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. As a similarity measure, matrix cosine similarity (a generalization of cosine similarity) is employed. State of the art performance is demonstrated on commonly used human eye fixation data [3] and some psychological patterns.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University