Results 1 - 10
of
63
Learning to Predict Where Humans Look
"... For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up com ..."
Abstract
-
Cited by 211 (4 self)
- Add to MetaCart
(Show Context)
For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements. To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features. This large database of eye tracking data is publicly available with this paper. 1.
Sun: A Bayesian framework for saliency using natural statistics
- Journal of Vision
, 2008
"... We propose a definition of saliency by considering what the visual system is trying to optimize when directing attention. The resulting model is a Bayesian framework from which bottom-up saliency emerges naturally as the self-information of visual features, and overall saliency (incorporating top-do ..."
Abstract
-
Cited by 143 (4 self)
- Add to MetaCart
(Show Context)
We propose a definition of saliency by considering what the visual system is trying to optimize when directing attention. The resulting model is a Bayesian framework from which bottom-up saliency emerges naturally as the self-information of visual features, and overall saliency (incorporating top-down information with bottom-up saliency) emerges as the pointwise mutual information between the features and the target when searching for a target. An implementation of our framework demonstrates that our model’s bottom-up saliency maps perform as well as or better than existing algorithms in predicting people’s fixations in free viewing. Unlike existing saliency measures, which depend on the statistics of the particular image being viewed, our measure of saliency is derived from natural image statistics, obtained in advance from a collection of natural images. For this reason, we call our model SUN (Saliency Using Natural statistics). A measure of saliency based on natural image statistics, rather than based on a single test image, provides a straightforward explanation for many search asymmetries observed in humans; the statistics of a single test image lead to predictions that are not consistent with these asymmetries. In our model, saliency is computed locally, which is consistent with the neuroanatomy of the early visual system and results in an efficient algorithm with few free parameters.
Static and Space-time Visual Saliency Detection by Self-Resemblance
"... We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
(Show Context)
We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings. Visual saliency is then computed using the said “self-resemblance ” measure. The framework results in a saliency map where each pixel (or voxel) indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. As a similarity measure, matrix cosine similarity (a generalization of cosine similarity) is employed. State of the art performance is demonstrated on commonly used human eye fixation data (static scenes [5] and dynamic scenes [16]) and some psychological patterns.
Saliency estimation using a non-parametric low-level vision model
- CVPR
"... Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. However, integrating spatial information and justifying the choice of various parame-ter values remain ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
(Show Context)
Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. However, integrating spatial information and justifying the choice of various parame-ter values remain open problems. In this paper we show that an efficient model of color appearance in human vision, which contains a principled selection of parameters as well as an innate spatial pooling mechanism, can be generalized to obtain a saliency model that outperforms state-of-the-art models. Scale integration is achieved by an inverse wavelet trans-form over the set of scale-weighted center-surround re-sponses. The scale-weighting function (termedECSF) has been optimized to better replicate psychophysical data on color appearance, and the appropriate sizes of the center-surround inhibition windows have been adjusted by training a Gaussian Mixture Model on eye-fixation data, thus avoid-ing ad-hoc parameter selection. Additionally, we conclude that the extension of a color appearance model to saliency estimation adds to the evidence for a common low-level vi-sual front-end for different visual tasks.
Exploiting Local and Global Patch Rarities for Saliency Detection
"... We introduce a saliency model based on two key ideas. The first one is considering local and global image patch rarities as two complementary processes. The second one is based on our observation that for different images, one of the RGB and Lab color spaces outperforms the other in saliency detecti ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
(Show Context)
We introduce a saliency model based on two key ideas. The first one is considering local and global image patch rarities as two complementary processes. The second one is based on our observation that for different images, one of the RGB and Lab color spaces outperforms the other in saliency detection. We propose a framework that measures patch rarities in each color space and combines them in a final map. For each color channel, first, the input image is partitioned into non-overlapping patches and then each patch is represented by a vector of coefficients that linearly reconstruct it from a learned dictionary of patches from natural scenes. Next, two measures of saliency (Local and Global) are calculated and fused to indicate saliency of each patch. Local saliency is distinctiveness of a patch from its surrounding patches. Global saliency is the inverse of a patch’s probability of happening over the entire image. The final saliency map is built by normalizing and fusing local and global saliency maps of all channels from both color systems. Extensive evaluation over four benchmark eye-tracking datasets shows the significant advantage of our approach over 10 state-of-the-art saliency models. 1.
Visual Saliency Model for Robot Cameras
"... Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient th ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
(Show Context)
Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient their cameras in open ended conditions. One difficulty is that most current models of visual saliency are computationally very expensive and not suited to real time implementations needed for robotic applications. Here we propose a very fast approximation to a Bayesian model of visual saliency recently proposed in the literature. The approximation can run in real time on current computers at very little computational cost, leaving plenty of CPU cycles for other tasks. We empirically evaluate the potential usefulness of the visual saliency model to control saccades of a camera in social robotics situations. We found that this simple general purpose saliency model doubled the success rate of the camera: it captured images of people 70 % of the time, when compared to a 35 % success rate when the camera was controlled using an open-loop scheme. After 3 saccades (camera movements), the robot was 96 % likely to capture at least one person. The results suggest that visual saliency models may provide a useful front end for camera control in robotics applications. I.
I-POMDP: An infomax model of eye movement
- In Proceedings of the International Conference on Development and Learning
, 2008
"... Abstract—Modeling eye-movements during search is important for building intelligent robotic vision systems, and for understanding how humans select relevant information and structure behavior in real time. Previous models of visual search (VS) rely on the idea of “saliency maps ” which indicate like ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Modeling eye-movements during search is important for building intelligent robotic vision systems, and for understanding how humans select relevant information and structure behavior in real time. Previous models of visual search (VS) rely on the idea of “saliency maps ” which indicate likely locations for targets of interest. In these models the eyes move to locations with maximum saliency. This approach has several drawbacks: (1) It assumes that oculomotor control is a greedy process, i.e., every eye movement is planned as if no further eye movements would be possible after it. (2) It does not account for temporal dynamics and how information is integrated as over time. (3) It does not provide a formal basis to understand how optimal search should vary as a function of the operating characteristics of the visual system. To address these limitations, we reformulate the problem of VS as an Information-gathering Partially Observable Markov Decision Process (I-POMDP). We find that the optimal control law depends heavily on the Foveal-Peripheral Operating Characteristic (FPOC) of the visual system. I.
Boosting bottom-up and top-down visual features for saliency estimation
- in CVPR, 2012
"... Abstract Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Abstract Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been fully explored or modeled. Here, we combine low-level features such as orientation, color, intensity, saliency maps of previous best bottom-up models with top-down cognitive visual features (e.g., faces, humans, cars, etc.) and learn a direct mapping from those features to eye fixations using Regression, SVM, and AdaBoost classifiers. By extensive experimenting over three benchmark eye-tracking datasets using three popular evaluation scores, we show that our boosting model outperforms 27 state-of-the-art models and is so far the closest model to the accuracy of human model for fixation prediction. Furthermore, our model successfully detects the most salient object in a scene without sophisticated image processings such as region segmentation.
A simple method for detecting salient regions
- Pattern Recognition
, 2009
"... A simple method for detecting salient regions in images is proposed. It requires only edge detection, threshold decomposition, the distance transform, and thresholding. Moreover, it avoids the need for setting any parameter values. Experiments show that the resulting regions are relatively coarse, b ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
(Show Context)
A simple method for detecting salient regions in images is proposed. It requires only edge detection, threshold decomposition, the distance transform, and thresholding. Moreover, it avoids the need for setting any parameter values. Experiments show that the resulting regions are relatively coarse, but overall the method is surprisingly effective, and has the benefit of easy implementation. Quantitative tests were carried out on Liu et al.’s dataset of 5000 images. Although the ratings of our simple method were not as good as their approach which involved an extensive training stage, they were comparable to several other popular methods from the literature. Further tests on Kootstra and Schomaker’s dataset of 99 images also showed promising results.
How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements
- in: Proceedings of the 29th Annual Symposium of the German Association for Pattern Recognition (DAGM 2007
, 2007
"... Abstract. Interest point detection in still images is a well-studied topic in computer vision. In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by learning a detector from examples: we record eye movement ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Interest point detection in still images is a well-studied topic in computer vision. In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by learning a detector from examples: we record eye movements of human subjects watching video sequences and train a neural network to predict which locations are likely to become eye move-ment targets. We show that our detector outperforms current spatiotem-poral interest point architectures on a standard classification dataset. 1