## One-shot learning of object categories (2006)

### Cached

### Download Links

Venue: | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |

Citations: | 227 - 17 self |

### BibTeX

@ARTICLE{Fei-fei06one-shotlearning,

author = {Li Fei-fei and Rob Fergus and Pietro Perona},

title = {One-shot learning of object categories},

journal = {IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE},

year = {2006},

volume = {28},

pages = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

### Citations

8094 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...te indexing variable h, representing the assignment of features to parts, prevents such a solution. Instead, an iterative variational method that resembles the Expectation-Maximization (EM) algorithm =-=[7]-=- is used to estimate the variational posterior. Afterward, recognition is performed on a query image by repeating the process of detecting regions and then evaluating the regions using the model param... |

2057 | Rapid Object Detection using a Boosted Cascade of Simple F eatures
- Viola, Jones
- 2001
(Show Context)
Citation Context ... a many-fold larger number of training examples—as a consequence, learning one object category requires a batch process involving thousands or tens of thousands of training examples [13], [34], [39], =-=[36]-=-. . L. Fei-Fei is with the University of Illinois Urbana-Champaign, 405 N. Mathews Ave., MC 251, Urbana, IL 61801. E-mail: feifeili@uiuc.edu. . R. Fergus is with the University of Oxford, Parks Road, ... |

1585 | Object recognition from local scale-invariant features
- Lowe
- 1999
(Show Context)
Citation Context ...est to learn the models from training examples. Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], =-=[27]-=-, [31] and recognition of categories [2], [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recogn... |

1241 | Bayesian data analysis
- Gelman, JB, et al.
- 2004
(Show Context)
Citation Context ...at are conjugate to their posterior distributions. In other words, a conjugate prior for a given probabilistic model is one for which the resulting posterior has the same functional form as the prior =-=[17]-=-. In the case of pð jX t; At; OfgÞ, we use a Normal-Wishart distribution as its conjugate prior. Given that pðX; Aj Þ was chosen to be a product of Gaussians (in Section 3.2), the entire integral of (... |

978 | An affine invariant interest point detector
- Mikolajczyk, Schmid
- 2002
(Show Context)
Citation Context ...[2], [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], =-=[28]-=-, and viewpoint-invariant [22], [31] representations and recognition. Categories are more general, requiring more complex representations and are more difficult to learn; most work has therefore focus... |

951 | Neural networkbased face detection
- Rowley, Baluja, et al.
- 1998
(Show Context)
Citation Context ...efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], [5], [13], [24], [26], =-=[32]-=-, [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-invariant [2... |

871 | Object class recognition by unsupervised scale-invariant learning
- Fergus, Perona, et al.
- 2003
(Show Context)
Citation Context ...arameters requires a many-fold larger number of training examples—as a consequence, learning one object category requires a batch process involving thousands or tens of thousands of training examples =-=[13]-=-, [34], [39], [36]. . L. Fei-Fei is with the University of Illinois Urbana-Champaign, 405 N. Mathews Ave., MC 251, Urbana, IL 61801. E-mail: feifeili@uiuc.edu. . R. Fergus is with the University of Ox... |

733 | Recognition-by-components: A theory of human image understanding
- Biederman
- 1987
(Show Context)
Citation Context ...touching them. We recognize both individuals (my mother, my office), as well as categories (a 1960s hairdo, a frog). By the time we are six years old, we recognize more than 104 categories of objects =-=[4]-=-, and keep learning more throughout our life. As we learn, we organize both objects and categories into useful and informative taxonomies and relate them to language. Replicating these abilities in th... |

733 | Gradient-based learning applied to document recognition
- Lecun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...mputational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], [5], [13], =-=[24]-=-, [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-... |

582 | Example-based learning for view-based human face detection
- Sung, Poggio
- 1998
(Show Context)
Citation Context ... kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], [5], [13], [24], [26], [32], [33], [34], =-=[35]-=-, [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-invariant [22], [31] represent... |

546 |
Markov Chain Monte Carlo in practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...e well peaked, in which case both ML and MAP are likely to yield poor models. 3.4.2 Other Inference Methods Sampling methods. At the other extreme, we can use numerical methods such as Gibbs Sampling =-=[18]-=- or MarkovChain Monte-Carlo (MCMC) [19] to give an accurate estimate of the integral in (3), but these can be computationally very expensive. In the constellation model, the dimensionality of is large... |

524 | Pictorial structures for object recognition
- Felzenszwalb, Huttenlocher
(Show Context)
Citation Context ...rts, it presents a major computational bottleneck. Imposing conditional independence by the use of a tree-structured model would reduce the complexity to OðN 2 PÞ in learning and OðNPÞ in recognition =-=[11]-=-, [15]. However, in doing so, other issues arise, such as how the optimal graph structure should be chosen. Since these issues are in themselves complex and are outside the focus of this paper, for th... |

483 | Comparing images using the hausdorff distance
- Huttenlocher, Klanderman, et al.
- 1993
(Show Context)
Citation Context ...2], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-invariant =-=[22]-=-, [31] representations and recognition. Categories are more general, requiring more complex representations and are more difficult to learn; most work has therefore focused on modeling and learning. V... |

464 | Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories
- Fei-Fei, Fergus, et al.
(Show Context)
Citation Context ...n 2. A detailed review of the mathematical framework of our recognition system follows in Section 3. Section 4 briefly introduces our methods for learning the model. Detailed derivations are given in =-=[9]-=-. We then proceed to test our ideas experimentally. In Section 5, we give implementational details for each stage of the system, from feature detection (Section 5.1) to the experimental setup for lear... |

380 | Detecting pedestrians using patterns of motion and appearance
- Viola, Jones, et al.
- 2003
(Show Context)
Citation Context ...rning: How could we estimate models of categories from very few, one in the limit, training examples? Most researchers have focused on special-interest categories: human faces [34], [36], pedestrians =-=[37]-=-, handwritten digits [24], and automobiles [34], [13]. Instead, we wish to develop techniques that apply equally well to any category that a human would readily recognize. With this objective in mind,... |

292 | Shape matching and object recognition using low distortion correspondences
- Berg, Malik
- 2005
(Show Context)
Citation Context ...ing categories, despite the simplicity of our implementation and the rudimentary prior we employ, will encourage other vision researchers to test their algorithms on larger and more diverse data sets =-=[43]-=-. ACKNOWLEDGMENTS The authors would like to thank Andrew Zisserman, David Mackay, Brian Ripley, and Joel Lindop. This work was supported by the Caltech CNSE, the UK EPSRC, and EC Project CogViSys. REF... |

283 | Unsupervised learning of models for recognition
- Weber, Welling, et al.
- 2000
(Show Context)
Citation Context ...quires a many-fold larger number of training examples—as a consequence, learning one object category requires a batch process involving thousands or tens of thousands of training examples [13], [34], =-=[39]-=-, [36]. . L. Fei-Fei is with the University of Illinois Urbana-Champaign, 405 N. Mathews Ave., MC 251, Urbana, IL 61801. E-mail: feifeili@uiuc.edu. . R. Fergus is with the University of Oxford, Parks ... |

281 |
Derivative–free adaptive rejection sampling for Gibbs sampling
- Gilks
- 1992
(Show Context)
Citation Context ...d MAP are likely to yield poor models. 3.4.2 Other Inference Methods Sampling methods. At the other extreme, we can use numerical methods such as Gibbs Sampling [18] or MarkovChain Monte-Carlo (MCMC) =-=[19]-=- to give an accurate estimate of the integral in (3), but these can be computationally very expensive. In the constellation model, the dimensionality of is large ( 100) for a reasonable number of part... |

239 | Sharing features: efficient boosting procedures for multiclass object detection - Torralba, Murphy, et al. - 2004 |

177 | A bayesian approach to unsupervised one-shot learning of object categories
- Li, Fergus, et al.
- 1134
(Show Context)
Citation Context ...our experiments currently use four part models for both the current algorithm and ML. 6 EXPERIMENTAL RESULTS 6.1 Data Sets In the first set of experiments, the same four object categories as in [13], =-=[8]-=- were used, 1 namely, human faces, motorbikes, airplanes, and spotted cats. These data sets contain a fair amount of background clutter and scale variation, although each category is presented from a ... |

165 | Learning methods for generic object recognition with invariance to pose and lighting - LeCun, Huang, et al. |

146 | A probabilistic approach to object recognition using local photometry and global geometry
- Burl, Weber, et al.
- 1998
(Show Context)
Citation Context ...ages. Once this is known, we can evaluate R by integrating out over . We now look at the particular object model used. 3.2 The Object Category Model Our chosen representation is a Constellation model =-=[6]-=-, [39], [13]. Given a query image, I, we find a set of N interesting regions in the image. From these N regions, we obtain two variables: X—the locations of the regions and A—the appearances of the re... |

138 | A sparse object category model for efficient learning and exhaustive recognition
- Fergus, Perona, et al.
- 2005
(Show Context)
Citation Context ...t presents a major computational bottleneck. Imposing conditional independence by the use of a tree-structured model would reduce the complexity to OðN 2 PÞ in learning and OðNPÞ in recognition [11], =-=[15]-=-. However, in doing so, other issues arise, such as how the optimal graph structure should be chosen. Since these issues are in themselves complex and are outside the focus of this paper, for the sake... |

136 | Inferring parameters and structure of latent variable models by variational Bayes
- Attias
- 1999
(Show Context)
Citation Context ... (in Section 3.2), the entire integral of (13) becomes a multivariate Student’s T distribution. Efficient learning schemes exist for estimating the hyper-parameters of the Normal-Wishart distribution =-=[3]-=-, having the same computational complexity as standard ML methods. These are introduced in Section 4. 3.5 Recognition Using a Conjugate Density Parameter Posterior Having specified a functional form f... |

107 | On the sensitivity of the Hough Transform for Object Recognition
- Grimson
- 1990
(Show Context)
Citation Context ...t is best to learn the models from training examples. Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], =-=[20]-=-, [27], [31] and recognition of categories [2], [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient ... |

96 | A visual category filter for google images
- Fergus, Perona, et al.
- 2004
(Show Context)
Citation Context ...ntly confined to textured image patches. Alternative representations such as curve contours, which model the outline of the object, could also be used with little modification to the underlying model =-=[14]-=-, [12]. This would allow the model to handle categories where the outline of the object is more important than its interior (e.g., bottles). 8. Currently, the background model is very simple: A unifor... |

93 | A computational model for visual selection
- Amit, Geman
- 1998
(Show Context)
Citation Context ...mples. Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories =-=[2]-=-, [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28... |

84 | A Statistical Approach to 3D Object Detection Applied to Faces and Cars
- Schneiderman
- 2000
(Show Context)
Citation Context ...ers requires a many-fold larger number of training examples—as a consequence, learning one object category requires a batch process involving thousands or tens of thousands of training examples [13], =-=[34]-=-, [39], [36]. . L. Fei-Fei is with the University of Illinois Urbana-Champaign, 405 N. Mathews Ave., MC 251, Urbana, IL 61801. E-mail: feifeili@uiuc.edu. . R. Fergus is with the University of Oxford, ... |

79 | Representation and detection of deformable shapes
- Felzenszwalb
- 2005
(Show Context)
Citation Context ...onfined to textured image patches. Alternative representations such as curve contours, which model the outline of the object, could also be used with little modification to the underlying model [14], =-=[12]-=-. This would allow the model to handle categories where the outline of the object is more important than its interior (e.g., bottles). 8. Currently, the background model is very simple: A uniform shap... |

76 | 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints
- Rothganger, Lazebnik, et al.
- 2003
(Show Context)
Citation Context ... learn the models from training examples. Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], =-=[31]-=- and recognition of categories [2], [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition ... |

75 | Recognition of planar object classes
- Burl, Perona
(Show Context)
Citation Context .... Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], =-=[5]-=-, [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], an... |

67 |
saliency and image description
- Kadir, Brady, et al.
(Show Context)
Citation Context ...derivation of the MAP parameter estimation in [10]. 5 IMPLEMENTATION 5.1 Feature Detection and Representation We use the same features as in [13]. They are found using the detector of Kadir and Brady =-=[23]-=-. This method finds regions that are salient over both location and scale. Gray-scale images are used as the input. The most salient regions are clustered over location and scale to give a reasonable ... |

36 | Viewpoint-invariant learning and detection of human heads
- Weber, Einhauser, et al.
- 2000
(Show Context)
Citation Context ...ing more complex representations and are more difficult to learn; most work has therefore focused on modeling and learning. Viewpoint and lighting have not been treated explicitly (exceptions include =-=[38]-=-, [40]), but rather treated as an additional source of in-class variability. We are interested in the problem of learning and recognition of categories (as opposed to individual objects). While the li... |

28 | Unsupervised Learning of Models for Object Recognition
- Weber
- 2000
(Show Context)
Citation Context ... statistical models and probabilistic detection techniques developed by [5], [13], [26], [39], which will be reviewed in Section 3.2. A comprehensive treatment of these models may be found in Weber’s =-=[41]-=- and Fergus’ [42] PhD theses. 3 THEORETICAL APPROACH 3.1 Overall Bayesian Framework Let’s say that we are looking for a flamingo bird in a query image that is presented to us. To decide whether there ... |

25 |
Combining class-specific fragments for object classification
- Sali, Ullman
- 1999
(Show Context)
Citation Context ...ency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], [5], [13], [24], [26], [32], =-=[33]-=-, [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-invariant [22], [3... |

23 |
A View of the EM Algorithm that
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...they are less attractive as compared with methods giving a distinct solution. Recursive Approximations. A variety of variational approximations exist that are recursive or incremental in nature [21], =-=[29]-=-. In such schemes, the data points are processed sequentially with the (approximate) marginal posterior pð jX t; At; OÞ being updated after each new data point. We explore one of such methods in [9]. ... |

12 |
Variational Bayes for d-dimensional Gaussian mixture models
- Penny
- 2001
(Show Context)
Citation Context ...stribution over the shape mean conditioned on the precision matrix is Normal: pð X ! j X ! ÞGð X ! jmX ! ; X ! X ! Þ. Together, the shape distribution pð X ! ; X ! Þ is a Normal-Wishart density [3], =-=[30]-=-. Note that f !;a!;B!;m!; !g are hyper-parameters for defining distributions of model parameters. Identical expressions apply to the appearance component in (15). We will show an empirical way of obta... |

10 |
Some examples of recursive variational approximations
- Humphreys, Titterington
- 2001
(Show Context)
Citation Context ...ence, they are less attractive as compared with methods giving a distinct solution. Recursive Approximations. A variety of variational approximations exist that are recursive or incremental in nature =-=[21]-=-, [29]. In such schemes, the data points are processed sequentially with the (approximate) marginal posterior pð jX t; At; OÞ being updated after each new data point. We explore one of such methods in... |

10 | Visual Object Category Recognition
- Fergus
- 2005
(Show Context)
Citation Context ...ls and probabilistic detection techniques developed by [5], [13], [26], [39], which will be reviewed in Section 3.2. A comprehensive treatment of these models may be found in Weber’s [41] and Fergus’ =-=[42]-=- PhD theses. 3 THEORETICAL APPROACH 3.1 Overall Bayesian Framework Let’s say that we are looking for a flamingo bird in a query image that is presented to us. To decide whether there is a flamingo bir... |

5 |
Finding Faces
- Leung, Burl, et al.
- 1995
(Show Context)
Citation Context ...ional efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects [16], [20], [27], [31] and recognition of categories [2], [5], [13], [24], =-=[26]-=-, [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on efficient recognition [27], lighting-invariant [27], [28], and viewpoint-invari... |

1 |
supplemental material
- Fei-Fei, Fergus, et al.
- 2006
(Show Context)
Citation Context ... This is called the Maximum A Posteriori (MAP) estimation. MAP argmax pðX t; Atj Þpð Þ: ð12Þ The form of pð Þ needs to be chosen carefully to ensure that the estimation procedure is efficient. In =-=[10]-=-, we shall give a more detailed account of pð Þ and methods for estimating MAP . Both ML and MAP assume a well peaked pð jX t; At; OÞ so that ð Þ is a suitable estimate of the entire distribution.s598... |

1 |
Shape from Shading
- Forsyth, Zisserman
- 1990
(Show Context)
Citation Context ...ues; it is best to learn the models from training examples. Fifth, computational efficiency must be kept in mind. Work on recognition may be divided into two groups: recognition of individual objects =-=[16]-=-, [20], [27], [31] and recognition of categories [2], [5], [13], [24], [26], [32], [33], [34], [35], [39], [36]. Individual objects are easier to handle, therefore, more progress has been made on effi... |