Results 1 - 10
of
35
AIBO's first words. The social learning of language and meaning
, 2001
"... This paper explores the hypothesis that language communication in its very first stage is bootstrapped in a social learning process under the strong influence of culture. A concrete framework for social learning has been developed based on the notion of a language game. Autonomous robots have been p ..."
Abstract
-
Cited by 88 (9 self)
- Add to MetaCart
This paper explores the hypothesis that language communication in its very first stage is bootstrapped in a social learning process under the strong influence of culture. A concrete framework for social learning has been developed based on the notion of a language game. Autonomous robots have been programmed to behave according to this framework. We show experiments that demonstrate why there has to be a causal role of language on category acquisition; partly by showing that it leads effectively to the bootstrapping of communication and partly by showing that other forms of learning do not generate categories usable in communication or make information assumptions which cannot be satisfied.
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
- Journal of Artificial Intelligence Research
, 2001
"... This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the parti ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. 1.
Human Action Detection Using PNF Propagation of Temporal Constraints
- In Proc. of the Conference on Computer Vision and Pattern Recognition
, 1997
"... In this paper we develop a representation for the temporal structure inherent in human actions and demonstrate an effective method for using that representation to detect the occurrence of actions. The temporal structure of the action, sub-actions, events, and sensor information is described using a ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
In this paper we develop a representation for the temporal structure inherent in human actions and demonstrate an effective method for using that representation to detect the occurrence of actions. The temporal structure of the action, sub-actions, events, and sensor information is described using a constraint network based on Allen's interval algebra. We map these networks onto a simpler, 3-valued domain (past,now,fut) network --- a PNF-network --- to allow fast detection of actions and sub-actions. The occurrence of an action is computed by considering the minimal domain of its PNF-network, under constraints imposed by the current state of the sensors and the previous states of the network. We illustrate the approach with examples, showing that a major advantage of PNF propagation is the detection and removal of situations inconsistent with the temporal structure of the action. We also examine a method to increase the robustness of PNF-propagation in the case of faulty sensors. 1 In...
Movement, Activity, and Action: The Role of Knowledge in the Perception of Motion
- Royal Society Workshop on Knowledge-based Vision in Man and Machine
, 1997
"... We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no c ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no contextual or sequence knowledge to be recognized; movement is often addressed using either view- invariant or view specific geometric techniques. Activity refers to sequences of movements or states, where the only real knowledge required is the statistics of the sequence; much of the recent work in gesture understanding falls within this category of motion perception. Finally, actions are larger scale events which typically include interaction with the environment and causal relationships; action understanding straddles the gray division between perception and cognition, computer vision and artificial intelligence. We illustrate these levels with examples drawn mostly from our work in unders...
The Computational Perception of Scene Dynamics
- Computer Vision and Image Understanding
, 1995
"... Understanding observations of interacting objects requires one to reason about the force-dynamic relations between objects. We present an implemented computational theory that derives force-dynamic interpretations directly from camera input. Interpretations are expressed in terms of assertions about ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Understanding observations of interacting objects requires one to reason about the force-dynamic relations between objects. We present an implemented computational theory that derives force-dynamic interpretations directly from camera input. Interpretations are expressed in terms of assertions about the kinematic and dynamic properties of objects. The feasibility of interpretations can be determined relative to Newtonian mechanics by a reduction to linear programming. Multiple feasible solutions are compared using a preference hierarchy to select plausible interpretations. We provide computational examples to demonstrate that our ontology is sufficiently rich to describe a wide variety of image sequences. KEYWORDS: Motion understanding, Scene dynamics, Perceptual inference, Knowledgebased perception, Domain theory, View-based representations. Submitted. 1 Introduction Both AI and psychology researchers have argued for the need to represent "causal" information about the world in ...
A Maximum-Likelihood Approach to Visual Event Classification
- In Proceedings of the Fourth European Conference on Computer Vision
"... This paper presents a novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes. The model th ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
This paper presents a novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes. The model that we employ does not presuppose prior recognition or tracking of 3D object pose, shape, or identity. We describe our general framework for using maximum-likelihood techniques for visual event classification, the details of the generative model that we use to characterise observations as instances of event types, and the implemented computational techniques used to support training and classification for this generative model. We conclude by illustrating the operation of our implementation on a small example.
A Multimodal Learning Interface for Grounding Spoken Language in Sensory Perceptions
- ACM TRANSACTIONS ON APPLIED PERCEPTION
, 2004
"... Most speech interfaces are based on natural language processing techniques that use pre-defined symbolic representations of word meanings and process only linguistic information. To understand and use language like their human counterparts in multimodal humancomputer interaction, computers need to ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Most speech interfaces are based on natural language processing techniques that use pre-defined symbolic representations of word meanings and process only linguistic information. To understand and use language like their human counterparts in multimodal humancomputer interaction, computers need to acquire spoken language and map it to other sensory perceptions. This paper presents a multimodal interface that learns to associate spoken language with perceptual features by being situated in users' everyday environments and sharing user-centric multisensory information. The learning interface is trained in unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behaviors. We collect acoustic signals in concert with multisensory information from non-speech modalities, such as user's perspective video, gaze positions, head directions and hand movements. The system firstly estimates users' focus of attention from eye and head cues. Attention, as represented by gaze fixation, is used for spotting the target object of user interest. Attention switches are calculated and used to segment an action sequence into action units which are then categorized by mixture hidden Markov models. A multimodal learning algorithm is developed to spot words from continuous speech and then associate them with perceptually grounded meanings extracted from visual perception and action. Successful learning has been demonstrated in the experiments of three natural tasks: "unscrewing a jar", "stapling a letter" and "pouring water".
Visual Event Classification via Force Dynamics
, 2000
"... This paper presents an implemented system, called LEONARD, that classifies simple spatial motion events, such as pick up and put down, from video input. Unlike previous systems that classify events based on their motion profile, LEONARD uses changes in the state of force-dynamic relations, suc ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper presents an implemented system, called LEONARD, that classifies simple spatial motion events, such as pick up and put down, from video input. Unlike previous systems that classify events based on their motion profile, LEONARD uses changes in the state of force-dynamic relations, such as support, contact, and attachment, to distinguish between event types. This paper presents an overview of the entire system, along with the details of the algorithm that recovers force-dynamic interpretations using prioritized circumscription and a stability test based on a reduction to linear programming. This paper also presents an example illustrating the end-to-end performance of LEONARD classifying an event from video input. Introduction People can describe what they see. If someone were to pick up a block and ask you what you saw, you could say The person picked up the block. In doing so, you describe both objects, like people and blocks, and events, like pickings up. Most...
Appearance-Based Motion Recognition of Human Actions
- MIT Media Lab, M.S. Thesis
, 1996
"... A new view-based approach to the representation and recognition of action is presented. The work is motivated by the observation that a human observer can easily and instantly recognize action in extremely low resolution imagery with no strong features or information about the three-dimensional stru ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
A new view-based approach to the representation and recognition of action is presented. The work is motivated by the observation that a human observer can easily and instantly recognize action in extremely low resolution imagery with no strong features or information about the three-dimensional structure of the scene. Our underlying representations for action are view-based descriptions of the coarse image motion. Using these descriptions, we propose an appearance-based recognition strategy embedded within a hypothesize-and-test paradigm. A binary motion region #BMR# image is initially computed to act as an index into the action library. The BMR grossly describes the spatial distribution of motion energy for a given view of a given action. Any stored BMRs that plausibly match the unknown input BMR are then tested for a coarse, categorical agreement with a known motion model of the action. Wehave developed two motion-based methods for the veri#cation of the hypothesized actions. The #...
Connecting language to the world
- Artificial Intelligence
, 2005
"... 1 Language in the World How does language relate to the non-linguistic world? If an agent is able to communicate linguistically and is also able to directly perceive and/or act on the world, how do perception, action, and language interact with and influence each other? Such questions are surely amo ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
1 Language in the World How does language relate to the non-linguistic world? If an agent is able to communicate linguistically and is also able to directly perceive and/or act on the world, how do perception, action, and language interact with and influence each other? Such questions are surely amongst the most important in Cognitive Science and Artificial Intelligence (AI). Language, after all, is a central aspect of the human mind – indeed it may be what distinguishes us from other species. There is sometimes a tendency in the academic world to study language in isolation, as a formal system with rules for well-constructed sentences; or to focus on how language relates to formal notations such as symbolic logic. But language did not evolve as an isolated system or as a way of communicating symbolic logic; it presumably evolved as a mechanism for exchanging information about the world, ultimately providing the medium for cultural transmission across generations. Motivated by these observations, the goal of this special issue is to bring together research in AI that focuses on relating language to the physical world. Language is of course also used to communicate about non-physical referents, but the ubiquity of physical metaphor in language [21] suggests that grounding in the physical world provides the foundations of semantics.

