Results 1 - 10
of
12
Unsupervised learning of models for recognition
- In ECCV
, 2000
"... Abstract. We present a method to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. We focus on a particular type of model where objects are represented as flexible constellations of rigid parts (features). The variability within a ..."
Abstract
-
Cited by 222 (19 self)
- Add to MetaCart
Abstract. We present a method to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. We focus on a particular type of model where objects are represented as flexible constellations of rigid parts (features). The variability within a class is represented by a joint probability density function (pdf) on the shape of the constellation and the output of part detectors. In a first stage, the method automatically identifies distinctive parts in the training set by applying a clustering algorithm to patterns selected by an interest operator. It then learns the statistical shape model using expectation maximization. The method achieves very good classification results on human faces and rear views of cars. 1 Introduction and Related Work We are interested in the problem of recognizing members of object classes, where we define an object class as a collection of objects which share characteristic features or parts that are visually similar and occur in similar spatial configurations. When building models for object classes of this type, one is faced with three problems (see Fig. 1).
A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry
, 1998
"... . Many object classes, including human faces, can be modeled as a set of characteristic parts arranged in a variable spatial configuration. We introduce a simplified model of a deformable object class and derive the optimal detector for this model. However, the optimal detector is not realizable exc ..."
Abstract
-
Cited by 111 (9 self)
- Add to MetaCart
. Many object classes, including human faces, can be modeled as a set of characteristic parts arranged in a variable spatial configuration. We introduce a simplified model of a deformable object class and derive the optimal detector for this model. However, the optimal detector is not realizable except under special circumstances (independent part positions). A cousin of the optimal detector is developed which uses "soft" part detectors with a probabilistic description of the spatial arrangement of the parts. Spatial arrangements are modeled probabilistically using shape statistics to achieve invariance to translation, rotation, and scaling. Improved recognition performance over methods based on "hard" part detectors is demonstrated for the problem of face detection in cluttered scenes. 1 Introduction Visual recognition of objects (chairs, sneakers, faces, cups, cars) is one of the most challenging problems in computer vision and artificial intelligence. Historically, there has been a...
Towards Automatic Discovery of Object Categories
, 2000
"... We propose a method to learn heterogeneous models of object classes for visual recognition. The training images contain a preponderance of clutter and learning is unsupervised. Our models represent objects as probabilistic constellations of rigid parts (features). The variability within a class is r ..."
Abstract
-
Cited by 94 (7 self)
- Add to MetaCart
We propose a method to learn heterogeneous models of object classes for visual recognition. The training images contain a preponderance of clutter and learning is unsupervised. Our models represent objects as probabilistic constellations of rigid parts (features). The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts. Our method automatically identifies distinctive features in the training set. The set of model parameters is then learned using expectation maximization (see the companion paper [11] for details). When trained on different, unlabeled and unsegmented views of a class of objects, each component of the mixture model can adapt to represent a subset of the views. Similarly, different component models can also "specialize" on sub-classes of an object class. Experiments on images of human heads, leaves from different species of trees, and motor-cars demonstrate that the method works...
Locating Salient Object Features
- Proc. of British Machine Vision Conference
, 1998
"... We present a method for locating salient object features. Salient features are those which have a low probability of being mis-classified with any other feature, and are therefore more easily found in a similar image containing an example of the object. The local image structure can be described by ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
We present a method for locating salient object features. Salient features are those which have a low probability of being mis-classified with any other feature, and are therefore more easily found in a similar image containing an example of the object. The local image structure can be described by vectors extracted using a standard `feature extractor' at a range of scales. We train statistical models for each feature, using vectors taken from a number of training examples. The feature models can then be used to find the probability of misclassifying a feature with all other features. Low probabilities indicate a salient feature. Results are presented showing that salient features can be relocated more reliably than features chosen using previous methods, including hand picked features.
From Regular Images to Animated Heads: A Least Squares Approach
- in European Conference on Computer Vision
, 1998
"... We show that we can effectively fit arbitrarily complex animation models to noisy image data. Our approach is based on least-squares adjustment using of a set of progressively finer control triangulations and takes advantage of three complementary sources of information: stereo data, silhouette edge ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
We show that we can effectively fit arbitrarily complex animation models to noisy image data. Our approach is based on least-squares adjustment using of a set of progressively finer control triangulations and takes advantage of three complementary sources of information: stereo data, silhouette edges and 2-D feature points.
Probabilistic Affine Invariants for Recognition
- In Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recogn
, 1998
"... Under a weak perspective camera model, the image plane coordinates in different views of a planar object are related by an affine transformation. Because of this property, researchers have attempted to use affine invariants for recognition. However, there are two problems with this approach: (1) obj ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Under a weak perspective camera model, the image plane coordinates in different views of a planar object are related by an affine transformation. Because of this property, researchers have attempted to use affine invariants for recognition. However, there are two problems with this approach: (1) objects or object classes with inherent variability cannot be adequately treated using invariants; and (2) in practice the calculated affine invariants can be quite sensitive to errors in the image plane measurements. In this paper we use probability distributions to address both of these difficulties. Under the assumption that the feature positions of a planar object can be modeled using a jointly Gaussian density, we have derived the joint density over the corresponding set of affine coordinates. Even when the assumptions of a planar object and a weak perspective camera model do not strictly hold, the results are useful because deviations from the ideal can be treated as deformability in the ...
Animated Heads from Ordinary Images: A Least Squares Approach
- Computer Vision and Image Understanding
, 1999
"... We show that we can effectively fit arbitrarily complex animation models to noisy data extracted from ordinary face images Our approach is based on least-squares adjustment, using of a set of progressively finer control triangulations and takes advantage of three complementary sources of information ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We show that we can effectively fit arbitrarily complex animation models to noisy data extracted from ordinary face images Our approach is based on least-squares adjustment, using of a set of progressively finer control triangulations and takes advantage of three complementary sources of information: stereo data, silhouette edges and 2-D feature points. In this way, complete head models---including ears and hair---can be acquired with a cheap and entirely passive sensor, such as an ordinary video camera. They can then be fed to existing animation software to produce synthetic sequences.
Data Driven Refinement of Active Shape Model Search
, 1996
"... Active Shape Models (ASMs) provide an efficient means of locating objects in images. By statistically modelling the shape variations in a class of objects they can rapidly and robustly fit to new examples. However, if an ASM does not represent all the shape variation exhibited by the object, the ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Active Shape Models (ASMs) provide an efficient means of locating objects in images. By statistically modelling the shape variations in a class of objects they can rapidly and robustly fit to new examples. However, if an ASM does not represent all the shape variation exhibited by the object, the model may not be able to locate new examples accurately. This paper describes two complementary approaches to allowing additional freedom to the points which compromise the model, enabling them to fit to the image data more accurately. We present results for synthetic and real images and discuss how the methods can be used in an interactive 'bootstrap ' training scheme where problems with over-constrained models are particularly important.
Correspondence Using Distinct Points Based on Image Invariants
, 1997
"... We present a method, based on the idea of distinctive points, for locating point correspondences between two images of similar objects independently of scale, orientation and position. Distinctive points are those which have a low probability of being mistaken with other points, and therefore are ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present a method, based on the idea of distinctive points, for locating point correspondences between two images of similar objects independently of scale, orientation and position. Distinctive points are those which have a low probability of being mistaken with other points, and therefore are more likely to be correctly located in a similar image. The local image structure at each image point is described by vectors of Cartesian differential invariants computed at a range of scales. Distinctive points lie in low density regions of the distribution of all vectors of invariants found in an image. The vectors of invariants of distinct points are used to locate similar points in a second image. Results of applying this technique to find correspondences between images of faces are shown. 1 Introduction A common problem in computer vision is that of establishing correspondences between images of similar objects. For images of objects with fixed 3D geometry 5 correspondences a...
From Image Synthesis to Image Analysis: Using Human Animation Models to Guide Feature Extraction
- in Fifth International Symposium on the 3-D Analysis of Human Movement
, 1998
"... We show that we can effectively fit complex animation models to noisy image data. Our approach is based on robust least-squares adjustment and takes advantage of three complementary sources of information: stereo data, silhouette edges and 2--D feature points. In this way, complete head models---inc ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We show that we can effectively fit complex animation models to noisy image data. Our approach is based on robust least-squares adjustment and takes advantage of three complementary sources of information: stereo data, silhouette edges and 2--D feature points. In this way, complete head models---including ears and hair---can be acquired with a cheap and entirely passive sensor, such as an ordinary video camera. The motion parameters of limbs can be similarly captured. They can then be fed to existing animation software to produce synthetic sequences.

