## Transformation-invariant clustering using the EM algorithm (2003)

Venue: | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |

Citations: | 60 - 12 self |

### BibTeX

@ARTICLE{Frey03transformation-invariantclustering,

author = {Brendan J. Frey and Nebojsa Jojic},

title = {Transformation-invariant clustering using the EM algorithm},

journal = {IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE},

year = {2003},

volume = {25},

number = {1},

pages = {1--17}

}

### Years of Citing Articles

### OpenURL

### Abstract

Clustering is a simple, effective way to derive useful representations of data, such as images and videos. Clustering explains the input as one of several prototypes, plus noise. In situations where each input has been randomly transformed (e.g., by translation, rotation, and shearing in images and videos), clustering techniques tend to extract cluster centers that account for variations in the input due to transformations, instead of more interesting and potentially useful structure. For example, if images from a video sequence of a person walking across a cluttered background are clustered, it would be more useful for the different clusters to represent different poses and expressions, instead of different positions of the person and different configurations of the background clutter. We describe a way to add transformation invariance to mixture models, by approximating the nonlinear transformation manifold by a discrete set of points. We show how the expectation maximization algorithm can be used to jointly learn clusters, while at the same time inferring the transformation associated with each input. We compare this technique with other methods for filtering noisy images obtained from a scanning electron microscope, clustering images from videos of faces into different categories of identification and pose and removing foreground obstructions from video. We also demonstrate that the new technique is quite insensitive to initial conditions and works better than standard techniques, even when the standard techniques are provided with extra data.

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...NT CLUSTERING USING THE EM ALGORITHM 7 learning, it is used to re-estimate the diagonal posttransformation noise covariance matrix . 3.3 Learning Using the EM Algorithm We now present an EM algorithm =-=[5]-=- for the transformed mixture of Gaussians (TMG) that starts with randomly initialized parameters and then performs maximum-likelihood parameter estimation. (Here, “likelihood” refers to the parameters... |

3030 | Eigenfaces for recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...ion and regression trees, multilayer perceptrons, Gaussian process regression, support vector classifiers, nearest-neighborhood methods (which may operate in linear subspaces spanned by eigen-vectors =-=[25]-=-, [22]) and adaptive-metric nearest-neighbor methods [21]. The very different approach we take here is to use unlabeled data to train a probability density model of the data (or generative model). The... |

814 | Gradient-based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...ned so that the model is invariant to the set of transformations. This approach has been used in the supervised framework to design “convolutional neural networks” that are trained using labeled data =-=[20]-=-. We show how this approach can be used for unsupervised learning. 3 TRANSFORMATION-INVARIANT CLUSTERING In this section, we show how to incorporate the discrete approximation described above into gen... |

610 | Probabilistic Visual Learning for Object Representation
- Moghaddam, Pentlend
- 1997
(Show Context)
Citation Context ...d regression trees, multilayer perceptrons, Gaussian process regression, support vector classifiers, nearest-neighborhood methods (which may operate in linear subspaces spanned by eigen-vectors [25], =-=[22]-=-) and adaptive-metric nearest-neighbor methods [21]. The very different approach we take here is to use unlabeled data to train a probability density model of the data (or generative model). The goal ... |

276 |
Graphical models for machine learning and digital communication
- Frey
- 1998
(Show Context)
Citation Context ...alcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], [13], [8], [1], =-=[6]-=-, [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data. This is what T... |

232 | The wake^sleep algorithm for unsupervised neural networks
- Hinton, Dayan, et al.
- 1995
(Show Context)
Citation Context ...es, asopposedtoasingle,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], =-=[15]-=-, [4], [13], [8], [1], [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type fro... |

198 | Shape quantization and recognition with randomized trees
- Amit, Geman
- 1997
(Show Context)
Citation Context ...,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], [13], [8], =-=[1]-=-, [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data. This is w... |

156 | Modeling the manifolds of images of handwritten digits
- Hinton, Dayan, et al.
- 1997
(Show Context)
Citation Context ...ative models (factor analysis, mixtures of factor analysis) have also been modified using linear approximations of the transformation manifold to build in some degree of invariance to transformations =-=[16]-=-. In general, the linear approximation is accurate for small transformations, but is inaccurate for large transformations. In some cases, a multiresolution version of the linear approximation can be u... |

130 | Towards automatic discovery of object categories
- Weber, Welling, et al.
- 2000
(Show Context)
Citation Context ...ch as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], [13], [8], [1], [6], [2], =-=[27]-=-. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data. This is what TMG does. We... |

58 | Estimating mixture models of images and inferring spatial transformations using the em algorithm
- Frey, Jojic
- 1999
(Show Context)
Citation Context ...ps. We propose a general purpose statistical method that can jointly normalize for transformations that occur in the training data, and learn a maximum-likelihood density model of the normalized data =-=[9]-=-, [10]. The technique can be applied to video sequences, but does not require that the images be temporally ordered. Improvements in performance can be achieved by introducing temporal dependencies, a... |

54 | Competition and multiple cause models
- Dayan, Zemel
- 1995
(Show Context)
Citation Context ...opposedtoasingle,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], =-=[4]-=-, [13], [8], [1], [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the... |

53 | Factorial learning and the EM algorithm
- Ghahramani
- 1995
(Show Context)
Citation Context ...edtoasingle,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], =-=[13]-=-, [8], [1], [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data.... |

50 |
Transformed component analysis: joint estimation of spatial transformations and image components
- Frey, Jojic
- 1999
(Show Context)
Citation Context ...e propose a general purpose statistical method that can jointly normalize for transformations that occur in the training data, and learn a maximum-likelihood density model of the normalized data [9], =-=[10]-=-. The technique can be applied to video sequences, but does not require that the images be temporally ordered. Improvements in performance can be achieved by introducing temporal dependencies, as desc... |

45 | Separating style and content
- Tenenbaum, Freeman
- 1997
(Show Context)
Citation Context ...nlinearly. Approximate inference of continuous variables can be approached in different ways, including Monte Carlo techniques (c.f. [18]) and variational techniques (c.f. [7]). Tenenbaum and Freeman =-=[24]-=- examine models, called “bilinear models,” where each hidden variable is a linear function of the data, given the other hidden variables. The authors derive an inference algorithm that iterates betwee... |

42 | Transformed hidden Markov models: Estimating mixture models and inferring spatial transformations in video sequences
- Jojic
- 2000
(Show Context)
Citation Context ...hnique can be applied to video sequences, but does not require that the images be temporally ordered. Improvements in performance can be achieved by introducing temporal dependencies, as described in =-=[19]-=-, [12]. One approach to predicting the transformation in an input image is to provide a training set of images plus their transformations to a supervised learning algorithm, such as classification and... |

31 |
Contour Tracking by Stochastic
- Isard, Blake
- 1996
(Show Context)
Citation Context ...rm inference and learning with continuous variables that combine nonlinearly. Approximate inference of continuous variables can be approached in different ways, including Monte Carlo techniques (c.f. =-=[18]-=-) and variational techniques (c.f. [7]). Tenenbaum and Freeman [24] examine models, called “bilinear models,” where each hidden variable is a linear function of the data, given the other hidden variab... |

26 | Multiresolution tangent distance for affine invariant classification
- Vasconcelos, Lippman
- 1997
(Show Context)
Citation Context ... general, the linear approximation is accurate for small transformations, but is inaccurate for large transformations. In some cases, a multiresolution version of the linear approximation can be used =-=[26]-=-, but this approach relies on assumptions about the size of the objects in the images. For significant levels of transformation, the nonlinear manifold can be better modeled using a discrete approxima... |

24 | Does the wake-sleep algorithm produce good density estimators
- Frey, Hinton, et al.
- 1995
(Show Context)
Citation Context ...ingle,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], [13], =-=[8]-=-, [1], [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data. This... |

22 |
Learning graphical models of images, videos and their spatial transformations
- Frey, Jojic
- 2000
(Show Context)
Citation Context ... can be applied to video sequences, but does not require that the images be temporally ordered. Improvements in performance can be achieved by introducing temporal dependencies, as described in [19], =-=[12]-=-. One approach to predicting the transformation in an input image is to provide a training set of images plus their transformations to a supervised learning algorithm, such as classification and regre... |

21 |
Learning and relearning
- Hinton, Sejnowski
- 1986
(Show Context)
Citation Context ...g causes, asopposedtoasingle,globalcause,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models =-=[17]-=-, [15], [4], [13], [8], [1], [6], [2], [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known ty... |

18 |
GTM: the Generative Topographic
- Svensen
- 1998
(Show Context)
Citation Context ...se,such as translation. The most principled way to model multiple causes is to derive approximate inference and learning algorithms in complex probability models [17], [15], [4], [13], [8], [1], [6], =-=[2]-=-, [27]. While this approach holds potential for solving complex problems, in many cases, what is needed is an efficient way to remove transformations of a known type from the data. This is what TMG do... |

18 |
Tangent prop—A formalism for specifying selected invariances in an adaptive network
- Simard, Victorri, et al.
- 1992
(Show Context)
Citation Context ...imensional. Linear approximations of the transformation manifold have been used to significantly improve the performance of supervised classifiers such as nearest neighbors and multilayer perceptrons =-=[23]-=-. Linear generative models (factor analysis, mixtures of factor analysis) have also been modified using linear approximations of the transformation manifold to build in some degree of invariance to tr... |

17 | Variational learning in nonlinear Gaussian belief networks
- Frey, Hinton
- 1999
(Show Context)
Citation Context ...us variables that combine nonlinearly. Approximate inference of continuous variables can be approached in different ways, including Monte Carlo techniques (c.f. [18]) and variational techniques (c.f. =-=[7]-=-). Tenenbaum and Freeman [24] examine models, called “bilinear models,” where each hidden variable is a linear function of the data, given the other hidden variables. The authors derive an inference a... |

12 |
Robustly estimating Changes
- Black, Fleet, et al.
- 2000
(Show Context)
Citation Context ...re sophisticated approximations, the resulting model is simple and inference is surprisingly fast. In contrast to methods that explain how one observed image differs from another observed image (c.f. =-=[3]-=-), our algorithms explain how the observed image differs from a model of the normalized image. This allows our techniques to properly warp two images to the model, even if the two images are warped ve... |

11 |
large-scale transformation-invariant clustering
- Frey, Jojic
(Show Context)
Citation Context ...e dimensionality of the transformation manifold. If there are n1 transformations of the first type,n2 transformations of the second type, etc., exact inference and learning takes order Q ini time. In =-=[11]-=-, we show how a variational technique can be used to decouple the inference of each type of transformation,s14 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2... |

5 |
Scanning electron microscope image enhancement
- Golem, Cohen
- 1998
(Show Context)
Citation Context ... Fig. 2a shows several 140 56 gray-scale images obtained from a scanning electron microscope. The electron detectors and the high-speed electrical circuits randomly translate the images and add noise =-=[14]-=-. Standard filtering techniques are not appropriate here, since the images are not aligned. As shown later in this paper, the images cannot be properly aligned using correlation because of the high le... |

1 |
Similarity Mmetric Learning for a Variable-Kernel Classifier
- Lowe
- 1995
(Show Context)
Citation Context ...n process regression, support vector classifiers, nearest-neighborhood methods (which may operate in linear subspaces spanned by eigen-vectors [25], [22]) and adaptive-metric nearest-neighbor methods =-=[21]-=-. The very different approach we take here is to use unlabeled data to train a probability density model of the data (or generative model). The goal of generative modeling is to learn a probability mo... |