Results 1 - 10
of
25
80 million tiny images: a large dataset for non-parametric object and scene recognition
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... ..."
Building the gist of a scene: the role of global image features in recognition
- Progress in Brain Research
, 2006
"... frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene gist understanding, based on scene-centered, rather than objectcentered primitives. We show that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Global features are based on configurations of spatial scales and are estimated without invoking segmentation or grouping operations. The scene-centered approach is not an alternative to local image analysis but would serve as a feed-forward and parallel pathway of visual processing, able to quickly constrain local feature analysis and enhance object recognition in cluttered natural scenes. 1
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
- PSYCHOLOGICAL REVIEW
, 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
From appearance to context-based recognition: Dense labeling in small images
, 2008
"... Traditionally, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. As supported by our human studies, this contextual information is necessary for accurate recognition in low resolution images. T ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Traditionally, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. As supported by our human studies, this contextual information is necessary for accurate recognition in low resolution images. This scenario with impoverished appearance information, as opposed to using images of higher resolution, provides an appropriate venue for studying the role of context in recognition. In this paper, we explore the role of context for dense scene labeling in small images. Given a segmentation of an image, our algorithm assigns each segment to an object category based on the segment’s appearance and contextual information. We explicitly model context between object categories through the use of relative location and relative scale, in addition to co-occurrence. We perform recognition tests on low and high resolution images, which vary significantly in the amount of appearance information present, using just the object appearance information, the combination of appearance and context, as well as just context without object appearance information (blind recognition). We also perform these tests in human studies and analyze our findings to reveal interesting patterns. With the use of our context model, our algorithm achieves state-of-the-art performance on MSRC and Corel. datasets.
Task and context determine where you look
"... The deployment of human gaze has been almost exclusively studied independent of any specific ongoing task and limited to two-dimensional picture viewing. This contrasts with its use in everyday life, which mostly consists of purposeful tasks where gaze is crucially involved. To better understand dep ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The deployment of human gaze has been almost exclusively studied independent of any specific ongoing task and limited to two-dimensional picture viewing. This contrasts with its use in everyday life, which mostly consists of purposeful tasks where gaze is crucially involved. To better understand deployment of gaze under such circumstances, we devised a series of experiments, in which subjects navigated along a walkway in a virtual environment and executed combinations of approach and avoidance tasks. The position of the body and the gaze were monitored during the execution of the task combinations and dependence of gaze on the ongoing tasks as well as the visual features of the scene was analyzed. Gaze distributions were compared to a random gaze allocation strategy as well as a specific “saliency model. ” Gaze distributions showed high similarity across subjects. Moreover, the precise fixation locations on the objects depended on the ongoing task to the point that the specific tasks could be predicted from the subject’s fixation data. By contrast, gaze allocation according to a random or a saliency model did not predict the executed fixations or the observed dependence of fixation locations on the specific task.
What do we perceive in a glance of a real-world scene
- J Vision
"... What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each stimulus. Thus, our paradigm differs from previous studies where subjects were cued before a picture was presented and/or were probed with multiple-choice questions. In the first stage, 90 novel grayscale photographs were foveally shown to a group of 22 native-English-speaking subjects. The presentation time was chosen at random from a set of seven possible times (from 27 to 500 ms). A perceptual mask followed each photograph immediately. After each presentation, subjects reported what they had just seen as completely and truthfully as possible. In the second stage, another group of naive individuals was instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. We show that within a single glance, much object- and scene-level information is perceived by human subjects. The richness of our perception, though, seems asymmetrical. Subjects tend to have a propensity toward perceiving natural scenes as being outdoor rather than indoor. The reporting of sensory- or feature-level information of a scene (such as shading and shape) consistently precedes the reporting of the semantic-level information. But once subjects recognize more semantic-level
How long to get to the "gist" of real-world natural scenes
- Visual Cognition
, 2005
"... This study aimed at assessing the processing time ofa natural scene in a fast categorization task ofits context or ``gist''. In Experiment 1, human subjects performed 4 go/no-go categorization tasks in succession with colour pictures of real-world scenes belonging to 2 natural categories: ``Sea' ' a ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This study aimed at assessing the processing time ofa natural scene in a fast categorization task ofits context or ``gist''. In Experiment 1, human subjects performed 4 go/no-go categorization tasks in succession with colour pictures of real-world scenes belonging to 2 natural categories: ``Sea' ' and ``mountain'', and 2 artificial categories: ``Indoor' ' and ``urban''. Experiment 2 used colour and greylevel scenes in the same tasks to assess the role ofcolour cues on performance. Pictures were flashed for 26 ms. Both experiments showed that the gist of realworld scenes can be extracted with high accuracy �>90%), short median RT �400± 460 ms) and early responses triggered with latencies as short as 260±300 ms. Natural scenes were processed faster than artificial scenes. Categories for which colour could have a diagnostic value were processed faster in colour than in grey. Finally, processing speed is compared for scene and object categorization tasks. Natural scenes are more than a simple collection ofobjects. However, much of the research on scene processing has been devoted to the understanding ofobject processing in scenes, leaving aside the question ofhow we process the whole
in press) ARTSCENE: A neural system for natural scene classification
- Journal of
, 2003
"... How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system classifies natural scene photographs by using multiple spatial scales to efficiently accumulate evidence for gist and texture ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system classifies natural scene photographs by using multiple spatial scales to efficiently accumulate evidence for gist and texture. ARTSCENE embodies a coarse-to-fine Texture Size Ranking Principle whereby spatial attention processes multiple scales of scenic information, from global gist to local textures, to learn and recognize scenic properties. The model can incrementally learn and rapidly predict scene identity by gist information alone, and then accumulate learned evidence from scenic textures to refine this hypothesis. The model shows how texture-fitting allocations of spatial attention, called attentional shrouds, can facilitate scene recognition, particularly when they include a border of adjacent textures. Using grid gist plus three shroud textures on a benchmark photograph dataset, ARTSCENE discriminates 4 landscape scene categories (coast, forest, mountain and countryside) with up to 91.85 % correct on a test set, outperforms alternative models in the literature which use biologically implausible computations, and outperforms component systems that use either gist or texture information alone. 2
The contributions of color to recognition memory for natural scenes
- Journal of Experimental Psychology: Learning, Memory and Cognition
, 2002
"... The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5%–10 % better for colored than for blackand-white images independent of exposure duration. Experiment 2 indicated little influence of contras ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5%–10 % better for colored than for blackand-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.
Holistic context modeling using semantic co-occurences
- In Proceedings of IEEE CVPR
, 2009
"... We present a simple framework to model contextual relationships between visual concepts. The new framework combines ideas from previous object-centric methods (which model contextual relationships between objects in an image, such as their co-occurrence patterns) and scenecentric methods (which lear ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We present a simple framework to model contextual relationships between visual concepts. The new framework combines ideas from previous object-centric methods (which model contextual relationships between objects in an image, such as their co-occurrence patterns) and scenecentric methods (which learn a holistic context model from the entire image, known as its “gist”). This is accomplished without demarcating individual concepts or regions in the image. First, using the output of a generic appearance based concept detection system, a semantic space is formulated, where each axis represents a semantic feature. Next, context models are learned for each of the concepts in the semantic space, using mixtures of Dirichlet distributions. Finally, an image is represented as a vector of posterior concept probabilities under these contextual concept models. It is shown that these posterior probabilities are remarkably noise-free, and an effective model of the contextual relationships between semantic concepts in natural images. This is further demonstrated through an experimental evaluation with respect to two vision tasks, viz. scene classification and image annotation, on benchmark datasets. The results show that, besides quite simple to compute, the proposed context models attain superior performance than state of the art systems in both tasks. 1.

