Results 1 
7 of
7
The Statistics of Optical Flow
 Computer Vision and Image Understanding
, 1999
"... When processing image sequences some representation of image motion must be derived as a first stage. The most often used such representation is the optical flow field, which is a set of velocity measurements of image patterns. It is well known that it is very difficult to estimate accurate optical ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
When processing image sequences some representation of image motion must be derived as a first stage. The most often used such representation is the optical flow field, which is a set of velocity measurements of image patterns. It is well known that it is very difficult to estimate accurate optical flow at locations in an image which correspond to scene discontinuities. What is less well known, however, is that even at the locations corresponding to smooth scene surfaces, the optical flow field often cannot be estimated accurately. Noise in the data causes many optical flow estimation techniques to give biased flow estimates. Very often there is consistent bias: the estimate tends to be an underestimate in length and to be in a direction closer to the majority of the gradients in the patch. This paper studies all three major categories of flow estimation methodsgradientbased, energybased, and correlation methods, and it analyzes different ways of compounding onedimensional motio...
Aloimonos, Ambiguity in structure from motion: Sphere versus plane
 Internat. J. Comput. Vision
, 1998
"... Abstract. If 3D rigid motion can be correctly estimated from image sequences, the structure of the scene can be correctly derived using the equations for image formation. However, an error in the estimation of 3D motion will result in the computation of a distorted version of the scene structure. Of ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Abstract. If 3D rigid motion can be correctly estimated from image sequences, the structure of the scene can be correctly derived using the equations for image formation. However, an error in the estimation of 3D motion will result in the computation of a distorted version of the scene structure. Of computational interest are these regions in space where the distortions are such that the depths become negative, because in order for the scene to be visible it has to lie in front of the image, and thus the corresponding depth estimates have to be positive. The stability analysis for the structure from motion problem presented in this paper investigates the optimal relationship between the errors in the estimated translational and rotational parameters of a rigid motion that results in the estimation of a minimum number of negative depth values. The input used is the value of the flow along some direction, which is more general than optic flow or correspondence. For a planar retina it is shown that the optimal configuration is achieved when the projections of the translational and rotational errors on the image plane are perpendicular. Furthermore, the projection of the actual and the estimated translation lie on a line through the center. For a spherical retina, given a rotational error, the optimal translation is the correct one; given a translational error, the optimal rotational error depends both in direction and value on the actual and estimated translation as well as the scene in view. The proofs, besides illuminating the confounding of translation and rotation in structure from motion, have an important application to ecological optics. The same analysis provides a computational explanation of why it is
The Ouchi illusion as an artifact of biased flow estimation
, 2000
"... A pattern by Ouchi has the surprising property that small motions can cause illusory relative motion between the inset and background regions. The effect can be attained with small retinal motions or a slight jiggling of the paper and is robust over large changes in the patterns, frequencies and bou ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
A pattern by Ouchi has the surprising property that small motions can cause illusory relative motion between the inset and background regions. The effect can be attained with small retinal motions or a slight jiggling of the paper and is robust over large changes in the patterns, frequencies and boundary shapes. In this paper, we explain that the cause of the illusion lies in the statistical difficulty of integrating local onedimensional motion signals into twodimensional image velocity measurements. The estimation of image velocity generally is biased, and for the particular spatial gradient distributions of the Ouchi pattern the bias is highly pronounced, giving rise to a large difference in the velocity estimates in the two regions. The computational model introduced to describe the statistical estimation of image velocity also accounts for the findings of psychophysical studies with variations of the Ouchi pattern and for various findings on the perception of moving plaids. The insight gained from this computational study challenges the current models used to explain biological vision systems and to construct robotic vision systems. Considering the statistical difficulties in image velocity estimation in conjunction with the problem of discontinuity detection in motion fields suggests that theoretically the process of optical flow computations should not be carried out in isolation but in conjunction with the higher level processes of 3D motion estimation, segmentation and shape computation.
Analyzing Action Representations
 Workshop on Algebraic Frames for the PerceptionAction Cycle, LNCS 1888
, 2000
"... . We argue that actions represent the basic seed of intelligence underlying perception of the environment, and the representations encoding actions should be the starting point upon which further studies of cognition are built. In this paper we make a first effort in characterizing these action ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
. We argue that actions represent the basic seed of intelligence underlying perception of the environment, and the representations encoding actions should be the starting point upon which further studies of cognition are built. In this paper we make a first effort in characterizing these action representations. In particular, from the study of simple actions related to 3D rigid motion interpretation, we deduce a number of principles for the possible computations responsible for the interpretation of spacetime geometry. Using these principles, we then discuss possible avenues on how to proceed in analyzing the representations of more complex human actions. 1 Introduction and Motivation During the late eighties, with the emergence of active vision, it was realized that vision should not be studied in a vacuum but in conjunction with action. Adopting a purposive viewpoint makes visual computations easier by placing them in the context of larger processes that accomplish tasks....
New Eyes for Building Models from Video
"... Models of realworld objects and actions for use in graphics, virtual and augmented reality and related fields can only be obtained through the use of visual data and particularly video. This paper examines the question of recovering shape models from video information. Given video of an object or a ..."
Abstract
 Add to MetaCart
Models of realworld objects and actions for use in graphics, virtual and augmented reality and related fields can only be obtained through the use of visual data and particularly video. This paper examines the question of recovering shape models from video information. Given video of an object or a scene captured by a moving camera, a prerequisite for model building is to recover the threedimensional (3D) motion of the camera which consists of a rotation and a translation at each instant. It is shown here that a spherical eye (an eye or system of eyes providing panoramic vision) is superior to a cameratype eye (an eye with restricted field of view such as a common video camera) as regards the competence of 3D motion estimation. This result is derived from a geometric/statistical analysis of all the possible computational models that can be used for estimating 3D motion from an image sequence. Regardless of the estimation procedure for a cameratype eye, the parameters of the 3D rigi...
ARTICLE NO. IV970649 Effects of Errors in the Viewing Geometry on Shape Estimation ∗
, 1996
"... A sequence of images acquired by a moving sensor contains information about the threedimensional motion of the sensor and the shape of the imaged scene. Interesting research during the past few years has attempted to characterize the errors that arise in computing 3D motion (egomotion estimation) a ..."
Abstract
 Add to MetaCart
A sequence of images acquired by a moving sensor contains information about the threedimensional motion of the sensor and the shape of the imaged scene. Interesting research during the past few years has attempted to characterize the errors that arise in computing 3D motion (egomotion estimation) as well as the errors that result in the estimation of the scene’s structure (structure from motion). Previous research is characterized by the use of optic flow or correspondence of features in the analysis as well as by the employment of particular algorithms and models of the scene in recovering expressions for the resulting errors. This paper presents a geometric framework that characterizes the relationship between 3D motion and shape in the presence of errors. We examine how the threedimensional space recovered by a moving monocular observer, whose 3D motion is estimated with some error, is distorted. We characterize the space of distortions by its level sets, that is, we characterize the systematic distortion via a family of isodistortion surfaces, which describes the locus over which the depths of points in the scene in view are distorted by the same multiplicative factor. The framework introduced in this way has a number of applications: Since the visible surfaces have positive depth (visibility constraint), by analyzing the geometry of the regions where the distortion factor is negative, that is, where the visibility constraint is violated, we make explicit situations which are likely to give rise to ambiguities in motion estimation, independent of the algorithm used. We provide a uniqueness analysis for 3D motion analysis from normal flow. We study the constraints on egomotion, object motion, and depth for an independently moving object to be detectable by a moving observer, and we offer a quantitative account of the precision needed in an inertial sensor for accurate estimation of 3D motion. c ○ 1998 Academic Press 1.
New eyes for building models from video
"... Models of realworld objects and actions for use in graphics, virtual and augmented reality and related fields can only be obtained through the use of visual data and particularly video. This paper examines the question of recovering shape models from video information. Given video of an object or a ..."
Abstract
 Add to MetaCart
Models of realworld objects and actions for use in graphics, virtual and augmented reality and related fields can only be obtained through the use of visual data and particularly video. This paper examines the question of recovering shape models from video information. Given video of an object or a scene captured by a moving camera, a prerequisite for model building is to recover the threedimensional (3D) motion of the camera which consists of a rotation and a translation at each instant. It is shown here that a spherical eye (an eye or system of eyes providing panoramic vision) is superior to a cameratype eye (an eye with restricted field of view such as a common video camera) as regards the competence of 3D motion estimation. This result is derived from a geometric/statistical analysis of all the possible computational models that can be used for estimating 3D motion from an image sequence. Regardless of the estimation procedure for a cameratype eye, the parameters of the 3D rigid motion (translation and rotation) contain errors satisfying specific geometric constraints. Thus, translation is always confused with rotation, resulting in inaccurate results. This confusion does not happen for the case of panoramic vision. Insights obtained from this study point to new ways of constructing powerful imaging devices that suit particular tasks in visualization and virtual reality better than conventional cameras, thus leading to a new camera technology. Such new eyes are constructed by putting together multiple existing video cameras in specific ways, thus obtaining eyes from eyes. For a new eye of this kind we describe an implementation for deriving models of scenes from video data, while avoiding the correspondence problem in the video sequence. © 2000 Elsevier Science B.V. All rights reserved.