From Few to many: Illumination cone models for face recognition under variable lighting and pose
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... We present a generative appearancebased method for recognizing human faces under variation in lighting and viewpoint. Our method exploits the fact that the set of images of an object in fixed pose, but under all possible illumination conditions, is a convex cone in the space of images. Using a smal ..."
Abstract

Cited by 433 (12 self)
We present a generative appearancebased method for recognizing human faces under variation in lighting and viewpoint. Our method exploits the fact that the set of images of an object in fixed pose, but under all possible illumination conditions, is a convex cone in the space of images. Using a small number of training images of each face taken with different lighting directions, the shape and albedo of the face can be reconstructed. In turn, this reconstruction serves as a generative model that can be used to render—or synthesize—images of the face under novel poses and illumination conditions. The pose space is then sampled, and for each pose the corresponding illumination cone is approximated by a lowdimensional linear subspace whose basis vectors are estimated using the generative model. Our recognition algorithm assigns to a test image the identity of the closest approximated illumination cone (based on Euclidean distance within the image space). We test our face recognition method on 4050 images from the Yale Face Database B; these images contain 405 viewing conditions (9 poses ¢ 45 illumination conditions) for 10 individuals. The method performs almost without error, except on the most extreme lighting directions, and significantly outperforms popular recognition methods that do not use a generative model.
What is the Set of Images of an Object Under All Possible Lighting Conditions
 IEEE CVPR
, 1996
"... The appearance of a particular object depends on both the viewpoint from which it is observed and the light sources by which it is illuminated. If the appearance of two objects is never identical for any pose or lighting conditions, then in theory the objects can always be distinguished or recogni ..."
Abstract

Cited by 318 (28 self)
The appearance of a particular object depends on both the viewpoint from which it is observed and the light sources by which it is illuminated. If the appearance of two objects is never identical for any pose or lighting conditions, then in theory the objects can always be distinguished or recognized. The question arises: What is the set of images of an object under all lighting conditions and pose? In this paper, ive consider only the set of images of an object under variable allumination (including multiple, extended light sources and attached shadows). We prove that the set of npixel images of a convex object with a Lambertian reflectance function, illuminated by an arbitrary number of point light sources at infinity, forms a convex polyhedral cone in IR " and that the dimension of this illumination cone equals the number of distinct surface normals. Furthermore, we show that the cone for a particular object can be constructed from three properly chosen images. Finally, we prove that the set of npixel images of an object of any shape and with an arbitrary reflectance function, seen under all possible illumination conditions, still forms a convex cone in Rn. Th.ese results immediately suggest certain approaches to object recognition. Throughout this paper, we ofler results demonstrating the empirical validity of the illumination cone representation. 1
Automatic Camera Recovery for Closed or Open Image Sequences
 In Proc. ECCV
, 1998
"... . We describe progress in completely automatically recovering 3D scene structure together with 3D camera positions from a sequence of images acquired by an unknown camera undergoing unknown movement. The main departure from previous structure from motion strategies is that processing is not sequenti ..."
Abstract

Cited by 219 (18 self)
. We describe progress in completely automatically recovering 3D scene structure together with 3D camera positions from a sequence of images acquired by an unknown camera undergoing unknown movement. The main departure from previous structure from motion strategies is that processing is not sequential. Instead a hierarchical approach is employed building from image triplets and associated trifocal tensors. This is advantageous both in obtaining correspondences and also in optimally distributing error over the sequence. The major step forward is that closed sequences can now be dealt with easily. That is, sequences where part of a scene is revisited at a later stage in the sequence. Such sequences contain additional constraints, compared to open sequences, from which the reconstruction can now benefit. The computed cameras and structure are the backbone of a system to build texture mapped graphical models directly from image sequences. 1 Introduction The goal of this work is to obtain ...
Illumination cones for recognition under variable lighting: Faces
 In Proc. IEEE Conf. on Comp. Vision and
, 1998
"... Due to illumination variability, the same object can appear dramatically di erent even when viewed in xed pose. To handle this variability, an object recognition system must employ a representation that is either invariant to, or models this variability. This paper presents an appearancebased metho ..."
Abstract

Cited by 97 (15 self)
Due to illumination variability, the same object can appear dramatically di erent even when viewed in xed pose. To handle this variability, an object recognition system must employ a representation that is either invariant to, or models this variability. This paper presents an appearancebased method formodeling the variability due to illumination in the images of objects. The method di ers from past appearancebased methods, however, in that a small set of training images is used to generate a representation { the illumination cone { which models the complete set of images of an object with Lambertian re ectance under an arbitrary combination of point light sources at in nity. This method isboth an implementation and extension (an extension in that it models cast shadows) of the illumination cone representation proposed in[3]. The method is tested on a database of 660 images of 10 faces, and the results exceed those of popular existing methods. 1
Classification with NonMetric Distances: Image Retrieval and Class Representation
, 2000
"... One of the key problems in appearancebased vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are nonmetric; but when the ..."
Abstract

Cited by 71 (0 self)
One of the key problems in appearancebased vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are nonmetric; but when the triangle inequality is not obeyed, most existing pattern recognition techniques are not applicable. We note that exemplarbased (or nearestneighbor) methods can be applied naturally when using a wide class of nonmetric similarity functions. The key issue, however, is to find methods for choosing good representatives of a class that accurately characterize it. We show that existing condensing techniques for finding class representatives are illsuited to deal with nonmetric dataspaces. We then focus on developing techniques for solving this problem, emphasizing two points: First, we show that the distance between two images is not a good measure of how well one image can represent ...
Robust Rotation and Translation Estimation in Multiview Reconstruction
"... It is known that the problem of multiview reconstruction can be solved in two steps: first estimate camera rotations and then translations using them. This paper presents new robust techniques for both of these steps. (i) Given pairwise relative rotations, global camera rotations are estimated linea ..."
Abstract

Cited by 46 (4 self)
It is known that the problem of multiview reconstruction can be solved in two steps: first estimate camera rotations and then translations using them. This paper presents new robust techniques for both of these steps. (i) Given pairwise relative rotations, global camera rotations are estimated linearly in least squares. (ii) Camera translations are estimated using a standard technique based on Second Order Cone Programming. Robustness is achieved by using only a subset of points according to a new criterion that diminishes the risk of chosing a mismatch. It is shown that only four points chosen in a special way are sufficient to represent a pairwise reconstruction almost equally as all points. This leads to a significant speedup. In image sets with repetitive or similar structures, nonexistent epipolar geometries may be found. Due to them, some rotations and consequently translations may be estimated incorrectly. It is shown that iterative removal of pairwise reconstructions with the largest residual and reregistration removes most nonexistent epipolar geometries. The performance of the proposed method is demonstrated on difficult wide baseline image sets. 1.
Linear Fitting with Missing Data for StructurefromMotion
 Computer Vision and Image Understanding
, 1997
"... this paper. This method is described in detail in [15]. We can briefly describe the method as formulating the least squares problem as a bilinear optimization, and then iteratively holding one set of variables constant while the others are optimized, so that each optimization is linear. We use their ..."
Abstract

Cited by 46 (6 self)
this paper. This method is described in detail in [15]. We can briefly describe the method as formulating the least squares problem as a bilinear optimization, and then iteratively holding one set of variables constant while the others are optimized, so that each optimization is linear. We use their method in our experiments, because it has good convergence properties and is easy to implement. For the problem they consider, Shum et al. state that a random starting point is sufficient to produce a good final solution. However, their experiments on this point cannot be used to draw conclusions for the problem of determining 3D structure from a sequence of 2D images. 3 A Novel Algorithm
Structure from Many Perspective Images with Occlusions
, 2002
"... This paper proposes a method for recovery of projective shape and motion from multiple images by factorization of a matrix containing the images of all scene points. Compared to previous methods, this method can handle perspective views and occlusions jointly. The projective depths of image points a ..."
Abstract

Cited by 28 (11 self)
This paper proposes a method for recovery of projective shape and motion from multiple images by factorization of a matrix containing the images of all scene points. Compared to previous methods, this method can handle perspective views and occlusions jointly. The projective depths of image points are estimated by the method of Sturm & Triggs [11] using epipolar geometry. Occlusions are solved by the extension of the method by Jacobs [8] for filling of missing data. This extension can exploit the geometry of perspective camera so that both points with known and unknown projective depths are used. Many ways of combining the two methods exist, and therefore several of them have been examined and the one with the best results is presented. The new method gives accurate results in practical situations, as demonstrated here with a series of experiments on laboratory and outdoor image sets. It becomes clear that the method is particularly suited for wide baseline multiple view stereo.
A multiframe structurefrommotion algorithm under perspective projection
 International Journal of Computer Vision
, 1999
"... Abstract. We present a fast, robust algorithm for multiframe structure from motion from point features which works for general motion and large perspective effects. The algorithm is for point features but easily extends to a direct method based on image intensities. Experiments on synthetic and rea ..."
Abstract

Cited by 25 (2 self)
Abstract. We present a fast, robust algorithm for multiframe structure from motion from point features which works for general motion and large perspective effects. The algorithm is for point features but easily extends to a direct method based on image intensities. Experiments on synthetic and real sequences show that the algorithm gives results nearly as accurate as the maximum likelihood estimate in a couple of seconds on an IRIS 10000. The results are significantly better than those of an optimal twoimage estimate. When the camera projection is close to scaled orthographic, the accuracy is comparable to that of the Tomasi/Kanade algorithm, and the algorithms are comparably fast. The algorithm incorporates a quantitative theoretical analysis of the basrelief ambiguity and exemplifies how such an analysis can be exploited to improve reconstruction. Also, we demonstrate a structurefrommotion algorithm for partially calibrated cameras, with unknown focal length varying from image to image. Unlike the projective approach, this algorithm fully exploits the partial knowledge of the calibration. It is given by a simple modification of our algorithm for calibrated sequences and is insensitive to errors in calibrating the camera center. Theoretically, we show that unknown focallength variations strengthen the effects of the basrelief ambiguity. This paper includes extensive experimental studies of twoframe reconstruction and the Tomasi/Kanade approach in comparison to our algorithm. We find that twoframe algorithms are surprisingly robust and accurate, despite some problems with local minima. We demonstrate experimentally that a nearly optimal