• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Transformation-invariant clustering using the EM algorithm (0)

by B Frey, N Jojic
Venue:TPAMI 2003
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 32
Next 10 →

A Graphical Model for Audiovisual Object Tracking

by Matthew J. Beal, Nebojsa Jojic, Ieee Computer Society, Hagai Attias - IEEE Transactions on Pattern Analysis and Machine Intelligence , 2003
"... We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our mo ..."
Abstract - Cited by 36 (0 self) - Add to MetaCart
We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.

Robust Parameterized Component Analysis: Theory and Applications to 2D Facial Modeling

by Fernando De la Torre, Michael J. Black - Computer Vision and Image Understanding, 91:53 – 71 , 2002
"... Principal Component Analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video sequ ..."
Abstract - Cited by 33 (6 self) - Add to MetaCart
Principal Component Analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video sequences as they undergo significant changes due to facial expressions. Here we develop person-specific facial appearance models (PSFAM), which use modular PCA to model complex intra-person appearance changes. Such models require aligned visual training data; in previous work, this has involved a time consuming and errorprone hand alignment and cropping process. Instead, we introduce parameterized component analysis to learn a subspace that is invariant to affine (or higher order) geometric transformations. The automatic learning of a PSFAM given a training image sequence is posed as a continuous optimization problem and is solved with a mixture of stochastic and deterministic techniques achieving sub-pixel accuracy.

A comparison of algorithms for inference and learning in probabilistic graphical models

by Brendan J. Frey, Nebojsa Jojic - IEEE Transactions on Pattern Analysis and Machine Intelligence , 2005
"... Computer vision is currently one of the most exciting areas of artificial intelligence re-search, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern clas-sification problems such as handwr ..."
Abstract - Cited by 33 (2 self) - Add to MetaCart
Computer vision is currently one of the most exciting areas of artificial intelligence re-search, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern clas-sification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing image input into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization (EM) algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sum-product algorithm and “loopy ” belief propagation. We describe how each technique can be applied in a model of multiple, occluding objects, and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

Describing Visual Scenes Using Transformed Objects and Parts

by E. B. Sudderth, A. Torralba, W. T. Freeman, A. S. Willsky - INT J COMPUT VIS , 2005
"... We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building i ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

Learning appearance and transparency manifolds of occluded objects in layers

by Brendan J. Frey, Nebojsa Jojic, Anitha Kannan - CVPR03, I:45–52 , 2003
"... Videos and software available at www.psi.toronto.edu/layers.html By mapping a set of input images to points in a lowdimensional manifold or subspace, it is possible to efficiently account for a small number of degrees of freedom. For example, images of a person walking can be mapped to a 1-dimension ..."
Abstract - Cited by 19 (5 self) - Add to MetaCart
Videos and software available at www.psi.toronto.edu/layers.html By mapping a set of input images to points in a lowdimensional manifold or subspace, it is possible to efficiently account for a small number of degrees of freedom. For example, images of a person walking can be mapped to a 1-dimensional manifold that measures the phase of the person’s gait. However, when the object is moving around the frame and being occluded by other objects, standard manifold modeling techniques (e.g., principal components analysis, factor analysis, locally linear embedding) try to account for global motion and occlusion. We show how factor analysis can be incorporated into a generative model of layered, 2.5-dimensional vision, to jointly locate objects, resolve occlusion ambiguities, and learn models of the appearance manifolds of objects. We demonstrate the algorithm on a video consisting of four occluding objects, two of which are people who are walking, and occlude each other for most of the duration of the video. Whereas standard manifold modeling techniques fail to extract information about the gaits, the layered model successfully extracts a periodic representation of the gait of each person. 1

Advances in Algorithms for Inference and Learning in Complex Probability Models for Vision

by Brendan J. Frey, Nebojsa Jojic - IEEE Trans. PAMI , 2002
"... Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handw ..."
Abstract - Cited by 11 (5 self) - Add to MetaCart
Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing a video into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based generative probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sum-product algorithm and "loopy" belief propagation. We describe how each technique can be applied to an illustrative example of inference and learning in models of multiple, occluding objects, and compare the performances of the techniques.

Translation-Invariant Mixture Models for Curve Clustering

by Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth - In Proc. Ninth ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining, Washington D.C., August 24–27 , 2003
"... In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being generated from a finite mixture of curve models. Each mixture component uses (a) a mean curve ba ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being generated from a finite mixture of curve models. Each mixture component uses (a) a mean curve based on a flexible non-parametric representation, (b) additive measurement noise, (c) randomly selected discrete-valued shifts of each curve with respect to the independent variable (i.e., typically along the time axis), and (d) random real-valued o#sets of each curve with respect to the observed variable. We show that the Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely shifts, o#sets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power on test data sets. The proposed approach provides a non-parametric, computationally e#cient, and robust methodology for clustering broad classes of curve data.

Joint probabilistic curve clustering and alignment

by Scott Gaffney, Padhraic Smyth - In Advances in Neural Information Processing Systems 17 , 2005
"... Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data. 1

Transformation invariant component analysis for binary images

by Zoran Zivkovic, Jakob Verbeek - In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume I , 2006
"... There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally using Bernoulli distributions. Furthermore, we address the problem of data alignment. Image data is often perturbed by some global transformations such as shifting, rotation, scaling etc. In such cases the data needs to be transformed to some canonical aligned form. As a second contribution, we extend the binary PCA to the ”transformation invariant mixture of binary PCAs ” which simultaneously corrects the data for a set of global transformations and learns the binary PCA model on the aligned data. 1 1.

Structure inference for Bayesian multisensory perception and tracking

by Timothy M. Hospedales, Sethu Vijayakumar - In Proc. International Joint Conference on Artificial Intelligence , 2007
"... Abstract—We investigate a solution to the problem of multisensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modeling work has focused largely on optimal fusion, lea ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Abstract—We investigate a solution to the problem of multisensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modeling work has focused largely on optimal fusion, leaving segregation unaccounted for and unexploited by machine perception systems. We illustrate a unifying Bayesian solution to multisensory perception and tracking, which accounts for both integration and segregation by explicit probabilistic reasoning about data association in a temporal context. Such an explicit inference of multimodal data association is also of intrinsic interest for higher level understanding of multisensory data. We illustrate this by using a probabilistic implementation of data association in a multiparty audiovisual scenario, where unsupervised learning and structure inference is used to automatically segment, associate, and track individual subjects in audiovisual sequences. Indeed, the structure-inference-based framework introduced in this work provides the theoretical foundation needed to satisfactorily explain many confounding results in human psychophysics experiments involving multimodal cue integration and association.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University