Results 1 - 10
of
17
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
- International Journal of Computer Vision
, 2001
"... In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a se ..."
Abstract
-
Cited by 351 (41 self)
- Add to MetaCart
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
A Conceptual Framework for Indexing Visual Information at Multiple Levels
- IN PROCEEDINGS OF SPIE INTERNET IMAGING 2000
, 2000
"... In this paper, we present a conceptual framework for indexing different aspects of visual information. Our framework unifies concepts from the literature in diverse fields such as cognitive psychology, library sciences, art, and the more recent contentbased retrieval. We present multiple level struc ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
In this paper, we present a conceptual framework for indexing different aspects of visual information. Our framework unifies concepts from the literature in diverse fields such as cognitive psychology, library sciences, art, and the more recent contentbased retrieval. We present multiple level structures for visual and non-visual information. The ten-level visual structure presented provides a systematic way of indexing images based on syntax (e.g., color, texture, etc.) and semantics (e.g., objects, events, etc.), and includes distinctions between general concept and visual concept. We define different types of relations (e.g., syntactic, semantic) at different levels of the visual structure, and also use a semantic information table to summarize important aspects related to an image. While the focus is on the development of a conceptual indexing structure, our aim is also to bring together the knowledge from various fields, unifying the issues that should be considered when building ...
A semantic typicality measure for natural scene categorization
- Pattern Recognition Symposium, DAGM
, 2004
"... Abstract. We propose an approach to categorize real-world natural scenes based on a semantic typicality measure. The proposed typicality measure allows to grade the similarity of an image with respect to a scene category. We argue that such a graded decision is appropriate and justified both from a ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Abstract. We propose an approach to categorize real-world natural scenes based on a semantic typicality measure. The proposed typicality measure allows to grade the similarity of an image with respect to a scene category. We argue that such a graded decision is appropriate and justified both from a human’s perspective as well as from the image-content point of view. The method combines bottom-up information of local semantic concepts with the typical semantic content
Natural Scene Retrieval Based on a Semantic Modeling Step
- In CIVR
, 2004
"... Abstract. In this paper, we present an approach for the retrieval of natural scenes based on a semantic modeling step. Semantic modeling stands for the classification of local image regions into semantic classes such as grass, rocks or foliage and the subsequent summary of this information in so-cal ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Abstract. In this paper, we present an approach for the retrieval of natural scenes based on a semantic modeling step. Semantic modeling stands for the classification of local image regions into semantic classes such as grass, rocks or foliage and the subsequent summary of this information in so-called conceptoccurrence vectors. Using this semantic representation, images from the scene categories coasts,rivers/lakes,forests,plains,mountains and sky/clouds are retrieved. We compare two implementations of the method quantitatively on a visually diverse database of natural scenes. In addition, the semantic modeling approach is compared to retrieval based on low-level features computed directly on the image. The experiments show that semantic modeling leads in fact to better retrieval performance. 1
A CONCEPTUAL FRAMEWORK FOR INCORPORATING COGNITIVE PRINCIPLES INTO GEOGRAPHIC DATABASE REPRESENTATION
- INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
, 2000
"... The advancement of GIS data models to allow the effective utilization of very large heterogeneous geographic databases requires a new approach that incorporates models of human cognition. The ultimate goal is to provide a cooperative human-computer environment for spatial analysis. We describe the P ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The advancement of GIS data models to allow the effective utilization of very large heterogeneous geographic databases requires a new approach that incorporates models of human cognition. The ultimate goal is to provide a cooperative human-computer environment for spatial analysis. We describe the Pyramid framework as an example of this new approach within the context of some important aspects of how humans conceptually store spatial information. The proposed framework provides the means to create multiple structural interpretations of observed geographic data and the ability to build knowledge hierarchies through the application of data mining and other statistical techniques. 1.
Function from visual analysis and physical interaction: a methodology for recognition of generic classes of objects
, 1997
"... ..."
Biologically Inspired Mobile Robot Vision Localization
- IEEE TRANSACTIONS ON ROBOTICS
"... We present a robot localization system using biologically-inspired vision. Our system models two extensively studied human visual capabilities: (1) extracting the “gist” of a scene to produce a coarse localization hypothesis, and (2) refining it by locating salient landmark points in the scene. Gist ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We present a robot localization system using biologically-inspired vision. Our system models two extensively studied human visual capabilities: (1) extracting the “gist” of a scene to produce a coarse localization hypothesis, and (2) refining it by locating salient landmark points in the scene. Gist is computed here as a holistic statistical signature of the image, yielding abstract scene classification and layout. Saliency is computed as a measure of interest at every image location, efficiently directing the time-consuming landmark identification process towards the most likely candidate locations in the image. The gist features and salient regions are then further processed using a Monte-Carlo localization algorithm to allow the robot to generate its position. We test the system in three different outdoor environments — building complex (38.4x54.86m area, 13966 testing images), vegetation-filled park (82.3x109.73m area, 26397 testing images), and open-field park (137.16x178.31m area, 34711 testing images) — each with its own challenges. The system is able to localize, on average, within 0.98, 2.63, and 3.46m, respectively, even with multiple kidnapped-robot instances.
Gist: A mobile robotics application of context-based vision in outdoor environment
- In Proceedings of the IEEE CVPR Workshop on Attention and Performance in Computer Vision
, 2005
"... We present context-based scene recognition for mobile robotics applications. Our classifier is able to differentiate outdoor scenes without temporal filtering relatively well from a variety of locations at a college campus using a set of features that together capture the “gist ” of the scene. We co ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present context-based scene recognition for mobile robotics applications. Our classifier is able to differentiate outdoor scenes without temporal filtering relatively well from a variety of locations at a college campus using a set of features that together capture the “gist ” of the scene. We compare the classification accuracy of a set of scenes from 1551 frames filmed outdoors along a path and dividing them to four and twelve different legs while obtaining a classification rate of 67.96 percent and 48.61 percent, respectively. We also tested the scalability of the features by comparing the classification results from the previous scenes with four legs with a longer path with eleven legs while obtaining a classification rate of 55.08 percent. In the end, some ideas are put forth to improve the theoretical strength of the gist features. 1.
The Foundations and Architecture of Autotutor
- Lecture Notes in Computer Science
, 1998
"... . The Tutoring Research Group at the University of Memphis is developing an intelligent tutoring system which takes advantages of recent technological advances in the areas of semantic processing of natural language, world knowledge representation, multimedia interfaces, and fuzzy descriptions. The ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. The Tutoring Research Group at the University of Memphis is developing an intelligent tutoring system which takes advantages of recent technological advances in the areas of semantic processing of natural language, world knowledge representation, multimedia interfaces, and fuzzy descriptions. The tutoring interaction is based on in-depth studies of human tutors, both skilled and unskilled. Latent semantic analysis will be used to semantically process and provide a representation for the student's contributions. Fuzzy production rules select appropriate topics and tutor dialogue moves from a rich curriculum script. The production rules will implement a variety of different tutoring styles, from a basic untrained tutor to one which uses sophisticated pedagogical strategies. The tutor will be evaluated on the naturalness of its interaction, with Turing-style tests, by comparing different tutoring styles, and by judging learning outcomes. 1 Introduction At the University of Memphis, our...
Estimating scene typicality from human ratings and image features
"... Scenes, like objects, are visual entities that can be categorized into functional and semantic groups. One of the core concepts of human categorization is the idea that category membership is graded: some exemplars are more typical than others. Here, we obtain human typicality rankings for more than ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Scenes, like objects, are visual entities that can be categorized into functional and semantic groups. One of the core concepts of human categorization is the idea that category membership is graded: some exemplars are more typical than others. Here, we obtain human typicality rankings for more than 120,000 images from 706 scene categories through an online rating task on Amazon Mechanical Turk. We use these rankings to identify the most typical examples of each scene category. Using computational models of scene classification based on global image features, we find that images which are rated as more typical examples of their category are more likely to be classified correctly. This indicates that the most typical scene examples contain the diagnostic visual features that are relevant for their categorization. Objectless, holistic representations of scenes might serve as a good basis for understanding how semantic categories are defined in term of perceptual representations.

