Results 1 - 10
of
10
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
- Journal of Artificial Intelligence Research
, 2001
"... This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the parti ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. 1.
Movement, Activity, and Action: The Role of Knowledge in the Perception of Motion
- Royal Society Workshop on Knowledge-based Vision in Man and Machine
, 1997
"... We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no c ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
We present several approaches to the machine perception of motion and discuss the role and levels of knowledge in each. In particular we describe different techniques of motion understanding as focusing on one of movement, activity, or action. Movements are the most atomic primitives, requiring no contextual or sequence knowledge to be recognized; movement is often addressed using either view- invariant or view specific geometric techniques. Activity refers to sequences of movements or states, where the only real knowledge required is the statistics of the sequence; much of the recent work in gesture understanding falls within this category of motion perception. Finally, actions are larger scale events which typically include interaction with the environment and causal relationships; action understanding straddles the gray division between perception and cognition, computer vision and artificial intelligence. We illustrate these levels with examples drawn mostly from our work in unders...
Naturally Conveyed Explanations of Device Behavior
- PUI 2001
, 2001
"... Designers routinely explain their designs to one another using sketches and verbal descriptions of behavior, both of which can be understood long before the device has been fully specified. But current design tools fail almost completely to support this sort of interaction, instead not only forcing ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Designers routinely explain their designs to one another using sketches and verbal descriptions of behavior, both of which can be understood long before the device has been fully specified. But current design tools fail almost completely to support this sort of interaction, instead not only forcing designers to specify details of the design, but typically requiring that they do so by navigating a forest of menus and dialog boxes, rather than directly describing the behaviors with sketches and verbal explanations. We have created a prototype system, called assistance, capable of interpreting multimodal explanations for simple 2-D kinematic devices. The program generates a model of the events and the causal relationships between events that have been described via hand drawn sketches, sketched annotations, and verbal descriptions. Our goal is to make the designer's interaction with the computer more like interacting with another designer. This requires the ability not only to understand physical devices but also to understand the means by which the explanations of these devices are conveyed.
Specific-to-General Learning for Temporal Events with Application to Learning . . .
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples -- only specific-to-general learning method based on these algorithms. We also present a polynomial-time -- computable "syntactic" subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally
Function from visual analysis and physical interaction: a methodology for recognition of generic classes of objects
, 1997
"... ..."
Estimating Contact Dynamics
"... Motion and interaction with the environment are fundamentally intertwined. Few people-tracking algorithms exploit such interactions, and those that do assume that surface geometry and dynamics are given. This paper concerns the converse problem, i.e., the inference of contact and environment propert ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Motion and interaction with the environment are fundamentally intertwined. Few people-tracking algorithms exploit such interactions, and those that do assume that surface geometry and dynamics are given. This paper concerns the converse problem, i.e., the inference of contact and environment properties from motion. For 3D human motion, with a 12-segment articulated body model, we show how one can estimate the forces acting on the body in terms of internal forces (joint torques), gravity, and the parameters of a contact model (e.g., the geometry and dynamics of a spring-based model). This is tested on motion capture data and video-based tracking data, with walking, jogging, cartwheels, and jumping. 1.
The Evolution of Object Categorization and the Challenge of Image Abstraction
"... Technical University. During my visit, a graduate student was kind enough to show me around Prague, including a visit to the Museum of Modern and Contemporary Art (Veletr˘zní Palác). It was there that I saw the sculpture ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Technical University. During my visit, a graduate student was kind enough to show me around Prague, including a visit to the Museum of Modern and Contemporary Art (Veletr˘zní Palác). It was there that I saw the sculpture
FOCUS: A Generalized Method for Object Discovery for Robots that Observe and Interact with Humans
- In Proceedings of the 2006 Conference on Human-Robot Interaction
, 2006
"... The essence of the signal-to-symbol problem consists of associating a symbolic description of an object (e.g., a chair) to a signal (e.g., an image) that captures the real object. Robots that interact with humans in natural environments must be able to solve this problem correctly and robustly. Howe ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The essence of the signal-to-symbol problem consists of associating a symbolic description of an object (e.g., a chair) to a signal (e.g., an image) that captures the real object. Robots that interact with humans in natural environments must be able to solve this problem correctly and robustly. However, the problem of providing complete object models a priori to a robot so that it can understand its environment from any viewpoint is extremely difficult to solve. Additionally, many objects have different uses which in turn can cause ambiguities when a robot attempts to reason about the activities of a human and their interactions with those objects. In this paper, we build upon the fact that robots that co-exist with humans should have the ability of observing humans using the different objects and learn the corresponding object definitions. We contribute an object recognition algorithm, FOCUS, that is robust to the variations of signals, combines structure and function of an object, and generalizes to multiple similar objects. FOCUS, which stands for Finding Object Classification through Use and Structure, combines an activity recognizer capable of capturing how an object is used with a traditional visual structure processor. FOCUS learns structural properties (visual features) of objects by knowing first the object's affordance properties and observing humans interacting with that object with known activities. The strength of the method relies on the fact that we can define multiple aspects of an object model, i.e., structure and use, that are individually robust but insufficient to define the object, but can do when combined.
Reasoning About the Functionality of Tools and Physical Artifacts
"... Tool use is an important characteristic of intelligent human behavior. Rep-resenting, classifying and recognizing tools by their functionality can provide us new opportunities for understanding and eventually improving an agent's interac-tion with the physical world. Techniques have been developed i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Tool use is an important characteristic of intelligent human behavior. Rep-resenting, classifying and recognizing tools by their functionality can provide us new opportunities for understanding and eventually improving an agent's interac-tion with the physical world. Techniques have been developed in a wide range of areas within artificial intelligence and other disciplines to represent and automatically reason about the functionality of tools. This article surveys past approaches to reasoning about functionality in the literature and attempts to give an overviewof the strengths and weaknesses of previous techniques. A number of issues that needs to be addressed are also reviewed.
Temporal causality for the analysis of visual events
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2010
"... We present a novel approach to the causal temporal analysis of event data from video content. Our key observation is that the sequence of visual words produced by a space-time dictionary representation of a video sequence can be interpreted as a multivariate point-process. By using a spectral versio ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a novel approach to the causal temporal analysis of event data from video content. Our key observation is that the sequence of visual words produced by a space-time dictionary representation of a video sequence can be interpreted as a multivariate point-process. By using a spectral version of the pairwise test for Granger causality, we can identify patterns of interactions between words and group them into independent causal sets. We demonstrate qualitatively that this produces semanticallymeaningful groupings, and we demonstrate quantitatively that these groupings lead to improved performance in retrieving and classifying social games from unstructured videos. 1.

