Results 1 - 10
of
45
Semiotic Schemas: A Framework for Grounding Language in Action and Perception
, 2005
"... A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured be ..."
Abstract
-
Cited by 58 (10 self)
- Add to MetaCart
A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured beliefs that are grounded in an agent’s physical environment through a causal-predictive cycle of action and perception. Words and basic speech acts are interpreted in terms of grounded schemas. The framework reflects lessons learned from implementations of several language processing robots. It provides a basis for the analysis and design of situated, multimodal communication systems that straddle symbolic and non-symbolic realms.
Choosing words in computer-generated weather forecasts
- Artificial Intelligence
, 2005
"... One of the main challenges in automatically generating textual weather forecasts is choosing appropriate English words to communicate numeric weather data. A corpus-based analysis of how humans write forecasts showed that there were major differences in how individual writers performed this task, th ..."
Abstract
-
Cited by 37 (15 self)
- Add to MetaCart
One of the main challenges in automatically generating textual weather forecasts is choosing appropriate English words to communicate numeric weather data. A corpus-based analysis of how humans write forecasts showed that there were major differences in how individual writers performed this task, that is, in how they translated data into words. These differences included both different preferences between potential near-synonyms that could be used to express information, and also differences in the meanings that individual writers associated with specific words. Because we thought these differences could confuse readers, we built our SumTime-Mousam weather-forecast generator to use consistent data-to-word rules, which avoided words which were only used by a few people, and words which were interpreted differently by different people. An evaluation by forecast users suggested that they preferred SumTime-Mousam’s texts to human-generated texts, in part because of better word choice; this may be the first time that an evaluation has shown that nlg texts are better than human-authored texts. Key words: natural language processing, natural language generation, language and the word, information presentation, weather forecasts, lexical choice, idiolect Preprint submitted to Elsevier Science 2 June 2005
Mental Imagery for a Conversational Robot
, 2004
"... To build robots that engage in fluid face-to-face spoken conversations with people, robots must have ways to connect what they say to what they see. A critical aspect of how language connects to vision is that language encodes points of view. The meaning of my left and your left differs due to an im ..."
Abstract
-
Cited by 36 (17 self)
- Add to MetaCart
To build robots that engage in fluid face-to-face spoken conversations with people, robots must have ways to connect what they say to what they see. A critical aspect of how language connects to vision is that language encodes points of view. The meaning of my left and your left differs due to an implied shift of visual perspective. The connection of language to vision also relies on object permanence. We can talk about things that are not in view. For a robot to participate in situated spoken dialog, it must have the capacity to imagine shifts of perspective, and it must maintain object permanence. We present a set of representations and procedures that enable a robotic manipulator to maintain a “mental model” of its physical environment by coupling active vision to physical simulation. Within this model, “imagined” views can be generated from arbitrary perspectives, providing the basis for situated language comprehension and production. An initial application of mental imagery for spatial language understanding for an interactive robot is described.
Probabilistic grounding of situated speech using plan recognition and reference resolution
- In Proceedings of the International Conference on Multimodal Interfaces
, 2005
"... Situated, spontaneous speech may be ambiguous along acoustic, lexical, grammatical and semantic dimensions. To understand such a seemingly difficult signal, we propose to model the ambiguity inherent in acoustic signals and in lexical and grammatical choices using compact, probabilistic representati ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Situated, spontaneous speech may be ambiguous along acoustic, lexical, grammatical and semantic dimensions. To understand such a seemingly difficult signal, we propose to model the ambiguity inherent in acoustic signals and in lexical and grammatical choices using compact, probabilistic representations of multiple hypotheses. To resolve semantic ambiguities we propose a situation model that captures aspects of the physical context of an utterance as well as the speaker’s intentions, in our case represented by recognized plans. In a single, coherent Framework for Understanding Situated Speech (FUSS) we show how these two influences, acting on an ambiguous representation of the speech signal, complement each other to disambiguate form and content of situated speech. This method produces promising results in a game playing environment and leaves room for other types of situation models.
Coupling Perception and Simulation: Steps Towards Conversational Robotics
- IN PROCEEDINGS OF IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS
, 2003
"... Human cognition makes extensive use of visualization and imagination. As a first step towards giving a robot similar abilities, we have built a robotic system that uses a perceptually-coupled physical simulator to produce an internal world model of the robot's environment. Real-time perceptual coupl ..."
Abstract
-
Cited by 22 (15 self)
- Add to MetaCart
Human cognition makes extensive use of visualization and imagination. As a first step towards giving a robot similar abilities, we have built a robotic system that uses a perceptually-coupled physical simulator to produce an internal world model of the robot's environment. Real-time perceptual coupling ensures that the model is constantly kept in synchronization with the physical environment as the robot moves and obtains new sense data. This model allows the robot to be aware of objects no longer in its field of view (a form of "object permanence"), as well as to visualize its environment through the eyes of the user by enabling virtual shifts in point of view using synthetic vision operating within the simulator. This architecture provides a basis for our long term goals of developing conversational robots that can ground the meaning of spoken language in terms of sensorimotor representations.
Situated language understanding as filtering perceived affordances
- Cognitive Science
, 2007
"... We introduce a computational theory of situated language understanding in which the meaning of words and utterances depend on the physical environment and the goals and plans of communication partners. According to the theory, concepts that ground linguistic meaning are neither internal nor external ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We introduce a computational theory of situated language understanding in which the meaning of words and utterances depend on the physical environment and the goals and plans of communication partners. According to the theory, concepts that ground linguistic meaning are neither internal nor external to language users, but instead span the objective-subjective boundary. To model the possible interactions between subject and object, the theory relies on the notion of perceived affordances: structured units of interaction that can be used for prediction at multiple levels of abstraction. Language understanding is treated as a process of filtering perceived affordances. The theory accounts for many aspects of the situated nature of human language use and provides a unified solution to a number of demands on any theory of language understanding including conceptual combination, prototypicality effects, and the generative nature of lexical items. To support the theory, we describe an implemented system that understands verbal commands situated in a virtual gaming environment. The implementation uses probabilistic hierarchical plan recognition to generate perceived affordances. The system has been evaluated on its ability to correctly interpret free-form spontaneous verbal commands recorded from unrehearsed game play between human players. The system is able to “step into the shoes ” of human players and correctly respond to a broad range of verbal commands in which linguistic meaning depends on social and physical context. We quantitatively compare the system’s predictions in response to direct player commands with the actions taken by human players and show generalization to unseen data across a range of situations and verbal constructions. 2 1
Connecting language to the world
- Artificial Intelligence
, 2005
"... 1 Language in the World How does language relate to the non-linguistic world? If an agent is able to communicate linguistically and is also able to directly perceive and/or act on the world, how do perception, action, and language interact with and influence each other? Such questions are surely amo ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
1 Language in the World How does language relate to the non-linguistic world? If an agent is able to communicate linguistically and is also able to directly perceive and/or act on the world, how do perception, action, and language interact with and influence each other? Such questions are surely amongst the most important in Cognitive Science and Artificial Intelligence (AI). Language, after all, is a central aspect of the human mind – indeed it may be what distinguishes us from other species. There is sometimes a tendency in the academic world to study language in isolation, as a formal system with rules for well-constructed sentences; or to focus on how language relates to formal notations such as symbolic logic. But language did not evolve as an isolated system or as a way of communicating symbolic logic; it presumably evolved as a mechanism for exchanging information about the world, ultimately providing the medium for cultural transmission across generations. Motivated by these observations, the goal of this special issue is to bring together research in AI that focuses on relating language to the physical world. Language is of course also used to communicate about non-physical referents, but the ubiquity of physical metaphor in language [21] suggests that grounding in the physical world provides the foundations of semantics.
Incremental, multi-level processing for comprehending situated dialogue in human-robot interaction
- In Language and Robots: Proceedings from the Symposium (LangRo’2007)IJCAI01
, 2007
"... in human-robot interaction ..."
Learning for semantic parsing using statistical machine translation techniques. Doctoral Dissertation Proposal
, 2005
"... Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural langu ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural language understanding has mainly focused on shallow semantic analysis, such as word-sense disambiguation and semantic role labeling. Semantic parsing, on the other hand, involves deep semantic analysis in which word senses, semantic roles and other components are combined to produce useful meaning representations for a particular application domain (e.g. database query). Prior research in machine learning for semantic parsing is mainly based on inductive logic programming or deterministic parsing, which lack some of the robustness that characterizes statistical learning. Existing statistical approaches to semantic parsing, however, are mostly concerned with relatively simple application domains in which a meaning representation is no more than a single semantic frame. In this proposal, we present a novel statistical approach to semantic parsing, WASP, which can handle meaning representations with a nested structure. The WASP algorithm learns a semantic parser given a set of sentences annotated with their correct meaning representations. The parsing model is based on the
A simple method for resolution of definite reference in a shared visual context
- In Procs of SIGdial
, 2008
"... We present a method for resolving definite exophoric reference to visually shared objects that is based on a) an automatically learned, simple mapping of words to visual features (“visual word semantics”), b) an automatically learned, semantically-motivated utterance segmentation (“visual grammar”), ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We present a method for resolving definite exophoric reference to visually shared objects that is based on a) an automatically learned, simple mapping of words to visual features (“visual word semantics”), b) an automatically learned, semantically-motivated utterance segmentation (“visual grammar”), and c) a procedure that, given an utterance, uses b) to combine a) to yield a resolution. We evaluated the method both on a pre-recorded corpus and in an online setting, where it performed with 81 % (chance: 14%) and 66 % accuracy, respectively. This is comparable to results reported in related work on simpler settings.

