Results 1 - 10
of
16
Visual Semantics: Extracting Visual Information from Text Accompanying Pictures
- In Proceedings of AAAI-94
, 1994
"... This research explores the interaction of textual and photographic information in document understanding. The problem of performing generalpurpose vision without a priori knowledge is difficult at best. The use of collateral information in scene understanding has been explored in computer vision sys ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
This research explores the interaction of textual and photographic information in document understanding. The problem of performing generalpurpose vision without a priori knowledge is difficult at best. The use of collateral information in scene understanding has been explored in computer vision systems that use scene context in the task of object identification. The work described here extends this notion by defining visual semantics, a theory of systematically extracting picture-specific information from text accompanying a photograph. Specifically, this paper discusses the multi-stage processing of textual captions with the following objectives: (i) predicting which objects (implicitly or explicitly mentioned in the caption) are present in the picture and (ii) generating constraints useful in locating /identifying these objects. The implementation and use of a lexicon specifically designed for the integration of linguistic and visual information is discussed. Finally, the research d...
Computational Models for Integrating Linguistic and Visual Information: A Survey
- Artificial Intelligence Review
, 1995
"... This paper surveys research in developing computational models for integrating linguistic and visual information. It begins with a discussion of systems which have been actually implemented and continues with computationally motivated theories of human cognition. Since existing research spans severa ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
This paper surveys research in developing computational models for integrating linguistic and visual information. It begins with a discussion of systems which have been actually implemented and continues with computationally motivated theories of human cognition. Since existing research spans several disciplines (e.g., natural language understanding, computer vision, knowledge representation), as well as several application areas, an important contribution of this paper is to categorize existing research based on inputs and objectives. Finally, some key issues related to integrating information from two such diverse sources are outlined and related to existing research. Throughout, the key issue addressed is the correspondence problem, namely how to associate visual events with words and vice versa. 1 Introduction Much has been said about the necessity of linking language and vision in order for a system to exhibit intelligent behaviour [Win73, Wal81]. A complete natural-language und...
A Computational Theory of Vocabulary Acquisition
- Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language (Menlo Park, CA/Cambridge
, 1998
"... As part of an interdisciplinary project to develop a computational cognitive model of a reader of narrative text, we are developing a computational theory of how natural-language-understanding systems can automatically acquire new vocabulary by determining from context the meaning of words that are ..."
Abstract
-
Cited by 22 (11 self)
- Add to MetaCart
As part of an interdisciplinary project to develop a computational cognitive model of a reader of narrative text, we are developing a computational theory of how natural-language-understanding systems can automatically acquire new vocabulary by determining from context the meaning of words that are unknown, misunderstood, or used in a new sense. `Context' includes surrounding text, grammatical information, and background knowledge, but no external sources. Our thesis is that the meaning of such a word can be determined from context, can be revised upon further encounters with the word, "converges" to a dictionary-like definition if enough context has been provided and there have been enough exposures to the word, and eventually "settles down" to a "steady state" that is always subject to revision upon further encounters with the word. The system is being implemented in the SNePS knowledgerepresentation and reasoning system. This essay is forthcoming as a chapter in Iwanska, L/ucja, & S...
Qualitative Spatial Reasoning about Objects in Motion: Application to Physics Problem Solving
- In IEEE Conf. on AI for Applications
"... This paper describes an ongoing project to develop a theory of qualitative spatial reasoning which merges a simple, intuitive description of the spatial extent, relative position, and orientation of objects with existing methods for qualitative reasoning about dynamically changing worlds. We are app ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper describes an ongoing project to develop a theory of qualitative spatial reasoning which merges a simple, intuitive description of the spatial extent, relative position, and orientation of objects with existing methods for qualitative reasoning about dynamically changing worlds. We are applying our theories within a system for problem solving about the magnetic fields domain. We describe methods for integrating diagram and text input to a problem solver, methods of abstraction for modeling the spatial extents of objects, and a method for modeling spatial relations between objects through inequalities on extremal points which directly allows reasoning about the effects of translational motion. 1 Introduction The goal of qualitative reasoning is to draw useful conclusions from incomplete knowledge, particularly for problems where methods relying on numerically precise inputs may be inapplicable. Textbooks, such as Resnick and Halliday's Physics text [17], are an interesting s...
A Computational Theory of Vocabulary Expansion
- In Proceedings of the 19th Annual Conference of the Cognitive Science Society
, 1997
"... As part of an interdisciplinary project to develop a computational cognitive model of a reader of narrative text, we are developing a computational theory of how natural-languageunderstanding systems can automatically expand their vocabulary by determining from context the meaning of words that are ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
As part of an interdisciplinary project to develop a computational cognitive model of a reader of narrative text, we are developing a computational theory of how natural-languageunderstanding systems can automatically expand their vocabulary by determining from context the meaning of words that are unknown, misunderstood, or used in a new sense. `Context ' includes surrounding text, grammatical information, and background knowledge, but no external sources. Our thesis is that the meaning of such a word can be determined from context, can be revised upon further encounters with the word, "converges" to a dictionary-like definition if enough context has been provided and there have been enough exposures to the word, and eventually "settles down" to a "steady state" that is always subject to revision upon further encounters with the word. The system is being implemented in the SNePS knowledge-representation and reasoning system. This document is a slightly modified version (containing the...
From Pictures to Words: Generating Locative Descriptions of Objects in an Image
- IN PROCEEDINGS OF THE IMAGE UNDERSTANDING WORKSHOP
, 1994
"... In this paper we describe a system that integrates image processing and natural language processing for tasks that involve communicating visual information. The system determines information about the spatial relationship of objects in images and conveys it in the form of an English sentence. We are ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we describe a system that integrates image processing and natural language processing for tasks that involve communicating visual information. The system determines information about the spatial relationship of objects in images and conveys it in the form of an English sentence. We are exploring the applicability of this system to two tasks: landmark navigation and the generation of descriptions of abnormal densities in radiographs. Our previous work described a computational model of preposition semantics and a method for handling some of the ambiguities associated with natural language. Here we concentrate on generating optimal locative expressions for object pairs. In describing the system we will explain the methodologies it employs to achieve its goals. We will illustrate the system's use of these methodologies through several examples for each task.
Use of Multimedia Input in Automated Image Annotation and Content-Based Retrieval
- Presented at Conference on Storage and Retrieval Techniques for Image Databases, SPIE ’95
, 1995
"... This research explores the interaction of linguistic and photographic information in an integrated text/image database. By utilizing linguistic descriptions of a picture (speech and text input) coordinated with pointing references to the picture, we extract information useful in two aspects: image i ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This research explores the interaction of linguistic and photographic information in an integrated text/image database. By utilizing linguistic descriptions of a picture (speech and text input) coordinated with pointing references to the picture, we extract information useful in two aspects: image interpretation and image retrieval. In the image interpretation phase, objects and regions mentioned in the text are identified; the annotated image is stored in a database for future use. We incorporate techniques from our previous research on photo understanding using accompanying text: a system, PICTION, which identifies human faces in a newspaper photograph based on the caption. In the image retrieval phase, images matching natural language queries are presented to a user in a ranked order. This phase combines the output of (i) the image interpretation/annotation phase, (ii) statistical text retrieval methods, and (iii) image retrieval methods (e.g., colour indexing). The system allows bo...
The Use of Document Structure Analysis to Retrieve Information From Documents in Digital Libraries
- in Digital Libraries”, Proc. SPIE, Document Recognition IV, Volume 3027, pp 207 -218
, 1997
"... This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired ( ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired (e.g., articles on a given topic), and are parsed to determine the type of document from which information is desired, the syntactic level of the information desired, and the level of analysis required to extract the information. Using these clauses in the query, a set of salient documents are retrieved, layout analysis and logical structure derivation are performed on the retrieved documents (using DeLoS, a document logical structure derivation system developed at CEDAR), and the documents are then analyzed in detail to extract the relevant logical components. A "document browser" application, being developed based on this approach, allows an user to interactively specify queries on the docu...
From Imagery to Salience: Locative Expressions in Context
, 1995
"... From Imagery to Salience: Locative Expressions in Context Alicia Abella This thesis gives a conceptual framework for representing, manipulating, measuring, and communicating ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of relationships. Thi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
From Imagery to Salience: Locative Expressions in Context Alicia Abella This thesis gives a conceptual framework for representing, manipulating, measuring, and communicating ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of relationships. This thesis articulates a theory of spatial relations, how they are represented as fuzzy predicates internally, and how they can be appropriately augmented or filtered using prior knowledge in order to produce natural language statements about location and space. This work quantifies the notions of context and vagueness, so that all spatial relations are measurably accurate, provably efficient, and matched to users' expectations. The system combines variable aspects of computer science and linguistics in such a way so as to be extensible to many environments. The system is demonstrated both in a landmark navigation task and in a medical task, two very separate domains. Contents Table of Cont...
Use of Captions and other Collateral Text in Understanding Photos
- Artificial Intelligence Review
, 1994
"... This research explores the interaction of textual and photographic information in image understanding. Specifically, it presents a computational model whereby textual captions are used as collateral information in the interpretation of the corresponding photographs. The final understanding of the pi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This research explores the interaction of textual and photographic information in image understanding. Specifically, it presents a computational model whereby textual captions are used as collateral information in the interpretation of the corresponding photographs. The final understanding of the picture and caption reflects a consolidation of the information obtained from each of the two sources and can thus be used in intelligent information retrieval tasks. The problem of building a general-purpose computer vision system without a priori knowledge is very difficult at best. The concept of using collateral information in scene understanding has been explored in systems that use general scene context in the task of object identification. The work described here extends this notion by incorporating picture specific information. A multi-stage system PICTION which uses captions to identify humans in an accompanying photograph is described. This provides a computationally less expensive ...

