Results 1 - 10
of
207
A Model of Saliency-based Visual Attention for Rapid Scene Analysis
, 1998
"... A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing salie ..."
Abstract
-
Cited by 688 (50 self)
- Add to MetaCart
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Contextual Priming for Object Detection
- IJCV
, 2003
"... There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework f ..."
Abstract
-
Cited by 132 (16 self)
- Add to MetaCart
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
A Context-Dependent Attention System for a Social Robot
, 1999
"... This paper presents part of an on-going project to integrate perception, attention, drives, emotions, behavior arbitration, and expressive acts for a robot designed to interact socially with humans. We present the design of a visual attention system based on a model of human visual search beha ..."
Abstract
-
Cited by 116 (18 self)
- Add to MetaCart
This paper presents part of an on-going project to integrate perception, attention, drives, emotions, behavior arbitration, and expressive acts for a robot designed to interact socially with humans. We present the design of a visual attention system based on a model of human visual search behavior from Wolfe (1994). The attention system integrates perceptions (motion detection, color saliency, and face popouts) with habituation e#ects and influences from the robot's motivational and behavioral state to create a context-dependent attention activation map. This activation map is used to direct eye movements and to satiate the drives of the motivational system.
Large Datasets at a Glance: Combining Textures and Colors in Scientific Visualization
- IEEE Transactions on Visualization and Computer Graphics
, 1999
"... This paper presents a new method for using texture and color to visualize multivariate data elements arranged on an underlying height field. We combine simple texture patterns with perceptually uniform colors to increase the number of attribute values we can display simultaneously. Our technique bui ..."
Abstract
-
Cited by 83 (20 self)
- Add to MetaCart
This paper presents a new method for using texture and color to visualize multivariate data elements arranged on an underlying height field. We combine simple texture patterns with perceptually uniform colors to increase the number of attribute values we can display simultaneously. Our technique builds multicolored perceptual texture elements (or pexels) to represent each data element. Attribute values encoded in an element are used to vary the appearance of its pexel. Texture and color patterns that form when the pexels are displayed can be used to rapidly and accurately explore the dataset. Our pexels are built by varying three separate texture dimensions: height, density, and regularity. Results from computer graphics, computer vision, and human visual psychophysics have identified these dimensions as important for the formation of perceptual texture patterns. The pexels are colored using a selection technique that controls color distance, linear separation, and color category. Prop...
Theory of Mind for a Humanoid Robot
- AUTONOMOUS ROBOTS
, 2002
"... If we are to build human-like robots that can interact naturally with people, our robots must know not only about the properties of objects but also the properties of animate agents in the world. One of the fundamental social skills for humans is the attribution of beliefs, goals, and desires to o ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
If we are to build human-like robots that can interact naturally with people, our robots must know not only about the properties of objects but also the properties of animate agents in the world. One of the fundamental social skills for humans is the attribution of beliefs, goals, and desires to other people. This set of skills has often been called a “theory of mind.” This paper presents the theories of Leslie (1994) and Baron-Cohen (1995) on the development of theory of mind in human children and discusses the potential application of both of these theories to building robots with similar capabilities. Initial implementation details and basic skills (such as finding faces and eyes and distinguishing animate from inanimate stimuli) are introduced. I further speculate on the usefulness of a robotic implementation in evaluating and comparing these two models.
A Comparison of Feature Combination Strategies for Saliency-Based Visual Attention Systems
- Journal of Electronic Imaging
, 1999
"... Bottom-up or saliency-based visual attention allows primates to detect non-specific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable "spotlight". The model described here reproduces ..."
Abstract
-
Cited by 81 (15 self)
- Add to MetaCart
Bottom-up or saliency-based visual attention allows primates to detect non-specific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable "spotlight". The model described here reproduces the attentional scanpaths of this spotlight: Simple multi-scale "feature maps" detect local spatial discontinuities in intensity, color, orientation or optical flow, and are combined into a unique "master" or "saliency" map. The saliency map is sequentially scanned, in order of decreasing saliency, by the focus of attention. We study the problem of combining feature maps, from different visual modalities and with unrelated dynamic ranges (such as color and motion), into a unique saliency map. Four combination strategies are compared using three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global non-linear normalization...
How to build robots that make friends and influence people
- Proc. IROS99, Kyonjiu, Korea
, 1999
"... In order to interact socially with a human, a robot must convey intentionality, that is, the human must believe that the robot has beliefs, desires, and intentions. We have constructed a robot which exploits natural human social tendencies to convey intentionality through motor actions and facial ex ..."
Abstract
-
Cited by 63 (7 self)
- Add to MetaCart
In order to interact socially with a human, a robot must convey intentionality, that is, the human must believe that the robot has beliefs, desires, and intentions. We have constructed a robot which exploits natural human social tendencies to convey intentionality through motor actions and facial expressions. We present results on the integration of perception, attention, motivation, behavior, and motor systems which allow the robot to engage in infant-like interactions with a human caregiver. 1
Top-Down Control of Visual Attention in Object Detection
, 2003
"... Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a m ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
- PSYCHOLOGICAL REVIEW
, 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
Automatic Thumbnail Cropping and its Effectiveness
, 2003
"... Thumbnail images provide users of image retrieval and browsing systems with a method for quickly scanning large numbers of images. Recognizing the objects in an image is important in many retrieval tasks, but thumbnails generated by shrinking the original image often render objects illegible. We stu ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Thumbnail images provide users of image retrieval and browsing systems with a method for quickly scanning large numbers of images. Recognizing the objects in an image is important in many retrieval tasks, but thumbnails generated by shrinking the original image often render objects illegible. We study the ability of computer vision systems to detect key components of images so that intelligent cropping, prior to shrinking, can render objects more recognizable. We evaluate automatic cropping techniques l) based on a method that detects salient portions of general images, and 2) based on automatic face detection. Our user study shows that these methods result in small thumbnails that are substantially more recognizable and easier to find in the context of visual search.

