Results 1  10
of
78
Training Products of Experts by Minimizing Contrastive Divergence
, 2002
"... It is possible to combine multiple latentvariable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual “expert ” models makes it hard to generate samples from the combined model but easy to infer the values of the l ..."
Abstract

Cited by 829 (79 self)
 Add to MetaCart
It is possible to combine multiple latentvariable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual “expert ” models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called “contrastive divergence ” whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.
Image Parsing: Unifying Segmentation, Detection, and Recognition
, 2005
"... In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural lang ..."
Abstract

Cited by 234 (21 self)
 Add to MetaCart
(Show Context)
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and reconfigures it dynamically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches  generative (topdown) methods and discriminative (bottomup) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottomup tests/filters.
Modeling the manifolds of images of handwritten digits
 IEEE Transactions on Neural Networks
, 1997
"... description length, density estimation. ..."
Detecting and reading text in natural scenes
 In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition
, 2004
"... This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually ..."
Abstract

Cited by 149 (2 self)
 Add to MetaCart
(Show Context)
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. A commercial OCR software is used to read the text or reject it as a nontext region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer. 1.
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 144 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.
Nonrigid point set registration: Coherent Point Drift (CPD)
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2006
"... We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registration of point sets. The registration is treated as a Maximum Likelihood (ML) estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with ..."
Abstract

Cited by 136 (0 self)
 Add to MetaCart
We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registration of point sets. The registration is treated as a Maximum Likelihood (ML) estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the nonrigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. This method can estimate complex nonlinear nonrigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.
Learning from one example through shared densities on transforms
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2000
"... We define a process called congealing in which elements of a dataset (images) are brought into correspondence with each other jointly, producing a datadefined model. It is based upon minimizing the summed componentwise (pixelwise) entropies over a continuous set of transforms on the data. One of t ..."
Abstract

Cited by 115 (7 self)
 Add to MetaCart
We define a process called congealing in which elements of a dataset (images) are brought into correspondence with each other jointly, producing a datadefined model. It is based upon minimizing the summed componentwise (pixelwise) entropies over a continuous set of transforms on the data. One of the biproducts of this minimization is a set of transforms, one associated with each original training sample. We then demonstrate a procedure for effectively bringing test data into correspondence with the datadefined model produced in the congealing process. Subsequently, we develop a probability density over the set of transforms that arose from the congealing process. We suggest that this density over transforms may be shared by many classes, and demonstrate how using this density as “prior knowledge ” can be used to develop a classifier based on only a single training example for each class. 1
Data driven image models through continuous joint alignment
 PAMI
, 2006
"... This paper presents a family of techniques that we call congealing for modeling image classes from data. The idea is to start with a set of images and make them appear as similar as possible by removing variability along the known axes of variation. This technique can be used to eliminate “nuisance ..."
Abstract

Cited by 85 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a family of techniques that we call congealing for modeling image classes from data. The idea is to start with a set of images and make them appear as similar as possible by removing variability along the known axes of variation. This technique can be used to eliminate “nuisance” variables such as affine deformations from handwritten digits or unwanted bias fields from magnetic resonance images. In addition to separating and modeling the latent images—i.e., the images without the nuisance variables—we can model the nuisance variables themselves, leading to factorized generative image models. When nuisance variable distributions are shared between classes, one can share the knowledge learned in one task with another task, leading to efficient learning. We demonstrate this process by building a handwritten digit classifier from just a single example of each class. In addition to applications in handwritten character recognition, we describe in detail the application of bias removal from magnetic resonance images. Unlike previous methods, we use a separate, nonparametric model for the intensity values at each pixel. This allows us to leverage the data from the MR images of different patients to remove bias from each other. Only very weak assumptions are made about the distributions of intensity values in the images. In addition to the digit and MR applications, we discuss a number of other uses of congealing and describe experiments about the robustness and consistency of the method.
Representation and recognition of handwritten digits using deformable templates
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—We investigate the application of deformable templates to recognition of handprinted digits. Two characters are matched by deforming the contour of one to fit the edge strengths of the other, and a dissimilarity measure is derived from the amount of deformation needed, the goodness of fit o ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
(Show Context)
Abstract—We investigate the application of deformable templates to recognition of handprinted digits. Two characters are matched by deforming the contour of one to fit the edge strengths of the other, and a dissimilarity measure is derived from the amount of deformation needed, the goodness of fit of the edges, and the interior overlap between the deformed shapes. Classification using the minimum dissimilarity results in recognition rates up to 99.25 percent on a 2,000 character subset of NIST Special Database 1. Additional experiments on an independent test data were done to demonstrate the robustness of this method. Multidimensional scaling is also applied to the 2,000 – 2,000 proximity matrix, using the dissimilarity measure as a distance, to embed the patterns as points in lowdimensional spaces. A nearest neighbor classifier is applied to the resulting pattern matrices. The classification accuracies obtained in the derived feature space demonstrate that there does exist a good lowdimensional representation space. Methods to reduce the computational requirements, the primary limiting factor of this method, are discussed. Index Terms—Digit recognition, deformable template, feature extraction, multidimensional scaling, clustering, nearest neighbor classification. 1
Pop: Patchwork of parts models for object recognition
 International Journal of Computer Vision
, 2004
"... We formulate a deformable template model for objects with a clearly defined mechanism for parameter estimation. A separate model is estimated for each class, and classification is likelihood based no discrmination boundaries are learned. Nonetheless high classification rates are achieved with smal ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
We formulate a deformable template model for objects with a clearly defined mechanism for parameter estimation. A separate model is estimated for each class, and classification is likelihood based no discrmination boundaries are learned. Nonetheless high classification rates are achieved with small training samples. The data models are defined on binary oriented edge features that are highly robust to photometric variation and small local deformations. The deformation of an object is defined in terms of locations of a moderate number reference points. Each reference point is associated with a part a probability map assigning a probability for each edge type at each pixel in a window. The likelihood of the edge data on the entire image conditional on the deformation is described as a patchwork of parts (POP) model the edges are assumed conditionally independent, and the marginal at each pixel is obtained by a patchwork operation: averaging the marginal probabilities contributed by each part covering the pixel. Object classes are modeled as mixtures of POP models that are discovered sequentially as more class data is observed. Experiments are presented on the MNIST database, hundreds of deformed LATEX shapes, reading zipcodes, and face detection. 1