Results 1 -
7 of
7
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
- Journal of Artificial Intelligence Research
, 2001
"... This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the parti ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. 1.
Situation recognition: Representation and algorithms
, 1993
"... The situation recognition system, to which this paper is devoted, receives as input a stream of time-stamped events; it performs recognition of instances of occurring situations, as they are developing, and it generates as output deduced events and actions to trigger. It is mainly a temporal reasoni ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
The situation recognition system, to which this paper is devoted, receives as input a stream of time-stamped events; it performs recognition of instances of occurring situations, as they are developing, and it generates as output deduced events and actions to trigger. It is mainly a temporal reasoning system. It is predictive in the sense that it predicts forthcoming events relevant to its task, it focuses its attention on them and it maintains their temporal windows of relevance. Its main functionality is to recognize efficiently complex temporal patterns on the fly, while they are taking place. This system has been tested for the surveillance of an environment by a multisensory perception machine; it is being applied to monitoring a complex dynamic system. 1
Grounding Language in Perception
- artificial Intelligence Review
, 1994
"... This paper describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input. The program receives an animated line-drawing as input and produces as output a semantic representation of the events occurring in that animation. This pape ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
This paper describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input. The program receives an animated line-drawing as input and produces as output a semantic representation of the events occurring in that animation. This paper suggests that the notions of support, contact, and attachment are crucial to specifying many simple spatial motion event types and presents a logical notation for describing classes of events that incorporates such notions as primitives. It then suggests that the truth values of such primitives can be recovered from perceptual input by a process of counterfactual simulation, predicting the effect of hypothetical changes to the world on the immediate future. Finally, it suggests that such counterfactual simulation is performed using knowledge of naive physical constraints such as substantiality, continuity, gravity, and ground plane. This paper describes the algorithms that incorporate these ideas in the program and illustrates the operation of the program on sample input.
The Computational Perception of Scene Dynamics
- Computer Vision and Image Understanding
, 1995
"... Understanding observations of interacting objects requires one to reason about the force-dynamic relations between objects. We present an implemented computational theory that derives force-dynamic interpretations directly from camera input. Interpretations are expressed in terms of assertions about ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Understanding observations of interacting objects requires one to reason about the force-dynamic relations between objects. We present an implemented computational theory that derives force-dynamic interpretations directly from camera input. Interpretations are expressed in terms of assertions about the kinematic and dynamic properties of objects. The feasibility of interpretations can be determined relative to Newtonian mechanics by a reduction to linear programming. Multiple feasible solutions are compared using a preference hierarchy to select plausible interpretations. We provide computational examples to demonstrate that our ontology is sufficiently rich to describe a wide variety of image sequences. KEYWORDS: Motion understanding, Scene dynamics, Perceptual inference, Knowledgebased perception, Domain theory, View-based representations. Submitted. 1 Introduction Both AI and psychology researchers have argued for the need to represent "causal" information about the world in ...
Specific-to-General Learning for Temporal Events with Application to Learning . . .
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples -- only specific-to-general learning method based on these algorithms. We also present a polynomial-time -- computable "syntactic" subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally
Causal reconstruction
- Massachusetts Institute of Technology, AI Lab, memo
, 1993
"... Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. This task is difficult because written descriptions often do not specify exactly how ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. This task is difficult because written descriptions often do not specify exactly how referenced events fit together. This article (1) characterizes the causal reconstruction problem, (2) presents a representation called transition space, which portrays events in terms of "transitions," or collections of changes expressible in everydaylanguage, and (3) describes a program called PATHFINDER, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity.PATHFINDER works byidentifying partial matches between the representations of events and using these matches to form causal chains, fill causal gaps, and merge overlapping accounts of activity. By applying transformations to events prior to matching, PATHFINDER is also able to handle a range of discontinuities arising from a writer's use of analogy or abstraction.
Learning Word-to-Meaning Mappings
, 1997
"... Children face five central difficulties when learning the vocabulary of their native language: learning from multi-word utterances, bootstrapping from an empty mental lexicon, referential uncertainty, noise, and homonymy. These difficulties are modeled formally via a simplified lexical acquisitio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Children face five central difficulties when learning the vocabulary of their native language: learning from multi-word utterances, bootstrapping from an empty mental lexicon, referential uncertainty, noise, and homonymy. These difficulties are modeled formally via a simplified lexical acquisition task called the mapping problem. Algorithms for solving this mapping problem are developed, based on the intuitive notions of cross-situational learning and the principle of contrast. Computer simulation demonstrates that these techniques are effective in solving this mapping problem. This motivates the hypothesis that children use such techniques, inter alia, when learning language. Introduction When acquiring their native language, children learn a lexicon that maps words to their meanings. On one hand, this task must be easy. Most children within the same linguistic community learn the same lexicon, despite the fact that each child hears widely different sets of utterances in wid...

