A Theory of Networks for Approximation and Learning
 Massachusetts Institute of Technology
, 1989
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory.
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Linear Object Classes and Image Synthesis From a Single Example Image
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced simpler techniques that are applicable under restricted conditions.
Abstract—The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced [1], [2], [3] simpler techniques that are applicable under restricted conditions. The approach exploits image transformations that are specific to the relevant object class, and learnable from example views of other “prototypical ” objects of the same class. In this paper, we introduce such a technique by extending the notion of linear class proposed by Poggio and Vetter. For linear object classes, it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively “rotate ” highresolution face images from a single 2D view. Index Terms—3D object recognition, rotation invariance, deformable templates, image synthesis. 1
Modal Matching for Correspondence and Recognition
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1995
Modal matching is a new method for establishing correspondences and computing canonical descriptions. The method is based on the idea of describing objects in terms of generalized symmetries, as defined by each object's eigenmodes. The resulting modal description is used for object recognition and categorization.
Modal matching is a new method for establishing correspondences and computing canonical descriptions. The method is based on the idea of describing objects in terms of generalized symmetries, as defined by each object's eigenmodes. The resulting modal description is used for object recognition and categorization, where shape similarities are expressed as the amounts of modal deformation energy needed to align the two objects. In general, modes provide a globaltolocal ordering of shape deformation and thus allow for selecting which types of deformations are used in object alignment and comparison. In contrast to previous techniques, which required correspondence to be computed with an initial or prototype shape, modal matching utilizes a new type of finite element formulation that allows for an object's eigenmodes to be computed directly from available image information. This improved formulation provides greater generality and accuracy, and is applicable to data of any dimensionality. Correspondence results with 2D contour and point feature data are shown, and recognition experiments with 2D images of hand tools and airplanes are described.
Finding Naked People
, 1996
This paper demonstrates a contentbased retrieval strategy that can tell whether there are naked people present in an image. No manual intervention is required. The approach combines color and texture properties to obtain an effective mask for skin regions. The skin mask is shown to be effective for a wide range of shades and colors of skin.
. This paper demonstrates a contentbased retrieval strategy that can tell whether there are naked people present in an image. No manual intervention is required. The approach combines color and texture properties to obtain an effective mask for skin regions. The skin mask is shown to be effective for a wide range of shades and colors of skin. These skin regions are then fed to a specialized grouper, which attempts to group a human figure using geometric constraints on human structure. This approach introduces a new view of object recognition, where an object model is an organized collection of grouping hints obtained from a combination of constraints on geometric properties such as the structure of individual parts, and the relationships between parts, and constraints on color and texture. The system is demonstrated to have 60% precision and 52% recall on a test set of 138 uncontrolled images of naked people, mostly obtained from the internet, and 1401 assorted control images, drawn f...
FORMS: A Flexible Object Recognition and Modeling System
 International Journal of Computer Vision
, 1995
We describe a flexible object recognition and modeling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition.
We describe a flexible object recognition and modeling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the midgrained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join midgrained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottomup manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a...
A probabilistic approach to object recognition using local photometry and global geometry
 European Conference on Computer Vision
, 1998
Many object classes, including human faces, can be modeled as a set of characteristic parts arranged in a variable spatial configuration. We introduce a simplified model of a deformable object class and derive the optimal detector for this model.
Abstract. Many object classes, including human faces, can be modeled as a set of characteristic parts arranged in a variable spatial con guration. We introduce a simpli ed model of a deformable object class and derive the optimal detector for this model. However, the optimal detector is not realizable except under special circumstances (independent part positions). A cousin of the optimal detector is developed which uses \soft " part detectors with a probabilistic description of the spatial arrangement of the parts. Spatial arrangements are modeled probabilistically using shape statistics to achieve invariance to translation, rotation, and scaling. Improved recognition performance over methods based on \hard " part detectors is demonstrated for the problem of face detection in cluttered scenes. 1
Algebraic Functions For Recognition
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1994
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment yielding a direct reprojection method that cuts through the computations of camera transformation, scene structure and epipolar geometry.
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment  yielding a direct reprojection method that cuts through the computations of camera transformation, scene structure and epipolar geometry. Moreover, the direct method is linear and sets a new lower theoretical bound on the minimal number of points that are required for a linear solution for the task of reprojection. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in reprojection tasks. Keywords Visual Recognition, Al...
Nonnegative tensor factorization with applications to statistics and computer vision
 In Proceedings of the International Conference on Machine Learning (ICML
, 2005
We derive algorithms for finding a nonnegative ndimensional tensor factorization (nNTF) which includes the nonnegative matrix factorization (NMF) as a particular case when n = 2. We motivate the use of nNTF in three areas of data analysis: (i) connection to latent class models in statistics, (ii) sparse image coding in computer vision, and (iii) model selection problems.
We derive algorithms for finding a nonnegative ndimensional tensor factorization (nNTF) which includes the nonnegative matrix factorization (NMF) as a particular case when n = 2. We motivate the use of nNTF in three areas of data analysis: (i) connection to latent class models in statistics, (ii) sparse image coding in computer vision, and (iii) model selection problems. We derive a ”direct ” positivepreserving gradient descent algorithm and an alternating scheme based on repeated multiple rank1 problems. 1.
An Eigenspace Update Algorithm for Image Analysis
, 1997
The vision research community has largely overlooked parallel developments in signal processing and numerical linear algebra concerning efficient eigenspace updating algorithms. These new developments are significant for two reasons: Adopting them will make some of the current vision algorithms more robust and efficient.
this paper However, the vision research community has largely overlooked makes the following contributions: parallel developments in signal processing and numerical linear algebra concerning efficient eigenspace updating algorithms. . We provide a comparison of some of the popular tech These new developments are significant for two reasons: Adopt niques existing in the vision literature for SVD/KLT com ing them will make some of the current vision algorithms more putations and point out the problems associated with robust and efficient. More important is the fact that incremental those techniques