Results 1 - 10
of
174
Pictorial Structures for Object Recognition
- IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract
-
Cited by 305 (13 self)
- Add to MetaCart
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Algebraic Functions For Recognition
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1994
"... In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct reprojection method that cuts through the computations of camera transformation, sce ..."
Abstract
-
Cited by 132 (29 self)
- Add to MetaCart
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct reprojection method that cuts through the computations of camera transformation, scene structure and epipolar geometry. Moreover, the direct method is linear and sets a new lower theoretical bound on the minimal number of points that are required for a linear solution for the task of reprojection. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in re-projection tasks. Keywords--- Visual Recognition, Al...
Discrete Geometric Shapes: Matching, Interpolation, and Approximation: A Survey
- Handbook of Computational Geometry
, 1996
"... In this survey we consider geometric techniques which have been used to measure the similarity or distance between shapes, as well as to approximate shapes, or interpolate between shapes. Shape is a modality which plays a key role in many disciplines, ranging from computer vision to molecular biolog ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
In this survey we consider geometric techniques which have been used to measure the similarity or distance between shapes, as well as to approximate shapes, or interpolate between shapes. Shape is a modality which plays a key role in many disciplines, ranging from computer vision to molecular biology. We focus on algorithmic techniques based on computational geometry that have been developed for shape matching, simplification, and morphing. 1 Introduction The matching and analysis of geometric patterns and shapes is of importance in various application areas, in particular in computer vision and pattern recognition, but also in other disciplines concerned with the form of objects such as cartography, molecular biology, and computer animation. The general situation is that we are given two objects A, B and want to know how much they resemble each other. Usually one of the objects may undergo certain transformations like translations, rotations or scalings in order to be matched with th...
Example Based Image Analysis and Synthesis
, 1993
"... Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically-based, 3D models. We describe here novel techniques for the analysis and the synthesis of new grey-level (and color) images. With the first technique, ffl the mapping from ..."
Abstract
-
Cited by 93 (26 self)
- Add to MetaCart
Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically-based, 3D models. We describe here novel techniques for the analysis and the synthesis of new grey-level (and color) images. With the first technique, ffl the mapping from novel images to a vector of "pose" and "expression" parameters can be learned from a small set of example images using a function approximation technique that we call an analysis network; ffl the inverse mapping from input "pose" and "expression" parameters to output grey-level images can be synthesized from a small set of example images and used to produce new images under real-time control using a similar learning network, called in this case a synthesis network. This technique relies on (i) using a correspondence algorithm that matches corresponding pixels among pairs of grey-level images and effectively "vectorizes" them, and (ii) exploiting a class of multidimensional interpolation n...
An Automatic Registration Method for Frameless Stereotaxy, Image Guided Surgery, and Enhanced Reality Visualization
, 1996
"... There is a need for frameless guidance systems to help surgeons plan the exact location for incisions, to define the margins of tumors and to precisely identify locations of neighboring critical structures. We have developed an automatic technique for registering clinical data, such as segmented M ..."
Abstract
-
Cited by 91 (12 self)
- Add to MetaCart
There is a need for frameless guidance systems to help surgeons plan the exact location for incisions, to define the margins of tumors and to precisely identify locations of neighboring critical structures. We have developed an automatic technique for registering clinical data, such as segmented MRI or CT reconstructions, with any view of the patient on the operating table, using a series of registration algorithms, which we demonstrate on the specific example of neurosurgery. The method enables a visual mix of live video of the patient with the segmented 3D MRI or CT model, supporting enhanced reality techniques for planning and guiding neurosurgical procedures, and to interactively view extracranial or intracranial structures non-intrusively. Extensions of the method include image guided biopsies, focused therapeutic procedures and clinical studies involving change detection over time sequences of images. 1 Artificial Intelligence Laboratory, Massachusetts Institute of Tech...
On Photometric Issues in 3D Visual Recognition From A Single 2D Image
- International Journal of Computer Vision
, 1997
"... . We describe the problem of recognition under changing illumination conditions and changing viewing positions from a computational and human vision perspective. On the computational side we focus on the mathematical problems of creating an equivalence class for images of the same 3D object undergo ..."
Abstract
-
Cited by 89 (6 self)
- Add to MetaCart
. We describe the problem of recognition under changing illumination conditions and changing viewing positions from a computational and human vision perspective. On the computational side we focus on the mathematical problems of creating an equivalence class for images of the same 3D object undergoing certain groups of transformations --- mostly those due to changing illumination, and briefly discuss those due to changing viewing positions. The computational treatment culminates in proposing a simple scheme for recognizing, via alignment, an image of a familiar object taken from a novel viewing position and a novel illumination condition. On the human vision aspect, the paper is motivated by empirical evidence inspired by Mooney images of faces that suggest a relatively high level of visual processing is involved in compensating for photometric sources of variability, and furthermore, that certain limitations on the admissible representations of image information may exist. The psycho...
Control of Selective Perception Using Bayes Nets and Decision Theory
, 1993
"... A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the neces ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the necessary operators. Knowledge representation and sequential decision-making are central issues for selective vision, which takes advantage of prior knowledge of a domain's abstract and geometrical structure and models for the expected performance and cost of visual operators. The TEA-1 selective vision system uses Bayes nets for representation and benefitcost analysis for control of visual and non-visual actions. It is the high-level control for an active vision system, enabling purposive behavior, the use of qualitative vision modules and a pointable multiresolution sensor. TEA-1 demonstrates that Bayes nets and decision theoretic techniques provide a general, re-usable framework for constructi...
Fast and Globally Convergent Pose Estimation From Video Images
, 1998
"... Determining the rigid transformation relating 2D images to known 3D geometry is a classical problem in photogrammetry and computer vision. Heretofore, the best methods for solving the problem have relied on iterative optimization methods which cannot be proven to converge and/or which do not effecti ..."
Abstract
-
Cited by 76 (3 self)
- Add to MetaCart
Determining the rigid transformation relating 2D images to known 3D geometry is a classical problem in photogrammetry and computer vision. Heretofore, the best methods for solving the problem have relied on iterative optimization methods which cannot be proven to converge and/or which do not effectively account for the orthonormal structure of rotation matrices. We show that the pose estimation problem can be formulated as that of minimizing an error metric based on collinearity in object (as opposed to image) space. Using object space collinearity error, we derive an iterative algorithm which directly computes orthogonal rotation matrices and which is globally convergent. Experimentally, we show that the method is computationally efficient, that it is no less accurate than the best currently employed optimization methods, and that it outperforms all tested methods in robustness to outliers. Chien-Ping Lu, Silicon Graphics Inc. cplu@engr.sgi.com y Greg Hager, Department of Computer...
A Statistical Approach to 3D Object Detection Applied to Faces and Cars
, 2000
"... In this thesis, we describe a statistical method for 3D object detection. In this method, we decompose the 3D geometry of each object into a small number of viewpoints. For each viewpoint, we construct a decision rule that determines if the object is present at that specific orientation. Each decisi ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
In this thesis, we describe a statistical method for 3D object detection. In this method, we decompose the 3D geometry of each object into a small number of viewpoints. For each viewpoint, we construct a decision rule that determines if the object is present at that specific orientation. Each decision rule uses the statistics of both object appearance and "non-object " visual appearance. We represent each set of statistics using a product of histograms. Each histogram represents the joint statistics of a subset of wavelet coefficients and their position on the object. Our approach is to use many such histograms representing a wide variety of visual attributes. Using this method, we have developed the first algorithm that can reliably detect faces that vary from frontal view to full profile view and the first algorithm that can reliably detect cars over a wide range of viewpoints.
Structure from motion without correspondence
- In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR
, 2000
"... A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information. The problem is framed as finding the maximum likelihood structure and motion given only the 2D measurements, integrating over all possible assignments of 3D fea ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information. The problem is framed as finding the maximum likelihood structure and motion given only the 2D measurements, integrating over all possible assignments of 3D features to 2D measurements. This goal is achieved by means of an algorithm which iteratively refines a probability distribution over the set of all correspondence assignments. At each iteration a new structure from motion problem is solved, using as input a set of ’virtual measurements’ derived from this probability distribution. The distribution needed can be efficiently obtained by Markov Chain Monte Carlo sampling. The approach is cast within the framework of Expectation-Maximization, which guarantees convergence to a local maximizer of the likelihood. The algorithm works well in practice, as will be demonstrated using results on several real image sequences. 1

