Results 1  10
of
177
Detecting faces in images: A survey
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
"... Images containing faces are essential to intelligent visionbased human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and expression recognition. However, many reported methods assume that the faces in an image or an image se ..."
Abstract

Cited by 839 (4 self)
 Add to MetaCart
(Show Context)
Images containing faces are essential to intelligent visionbased human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face regardless of its threedimensional position, orientation, and the lighting conditions. Such a problem is challenging because faces are nonrigid and have a high degree of variability in size, shape, color, and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics, and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.
Mixtures of Probabilistic Principal Component Analysers
, 1998
"... Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a com ..."
Abstract

Cited by 532 (6 self)
 Add to MetaCart
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximumlikelihood framework, based on a specific form of Gaussian latent variable model. This leads to a welldefined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its applicat...
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 351 (18 self)
 Add to MetaCart
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
The EM Algorithm for Mixtures of Factor Analyzers
, 1997
"... Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing different local factor models in different regions of the input space. This results in a model which concurrently performs cluste ..."
Abstract

Cited by 278 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing different local factor models in different regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact ExpectationMaximization algorithm for fitting the parameters of this mixture of factor analyzers. 1 Introduction Clustering and dimensionality reduction have long been considered two of the fundamental problems in unsupervised learning (Duda & Hart, 1973; Chapter 6). In clustering, the goal is to group data points by similarity between their features. Conversely, in dimensionality reduction, the goal is to group (or compress) features that are highly correlated. In this paper we present an EM learning algorithm for a method which combines one of the basic forms of dime...
Variational learning for switching statespace models
 Neural Computation
, 1998
"... We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Ma ..."
Abstract

Cited by 173 (5 self)
 Add to MetaCart
(Show Context)
We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Markov models and linear dynamical systems  and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log likelihood and makes use of both the forwardbackward recursions for hidden Markov models and the Kalman lter recursions for linear dynamical systems. We tested the algorithm both on artificial data sets and on a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching statespace models.
Random projections of smooth manifolds
 Foundations of Computational Mathematics
, 2006
"... We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N ..."
Abstract

Cited by 144 (26 self)
 Add to MetaCart
(Show Context)
We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N → R M, M < N, on a smooth wellconditioned Kdimensional submanifold M ⊂ R N. As our main theoretical contribution, we establish a sufficient number M of random projections to guarantee that, with high probability, all pairwise Euclidean and geodesic distances between points on M are wellpreserved under the mapping Φ. Our results bear strong resemblance to the emerging theory of Compressed Sensing (CS), in which sparse signals can be recovered from small numbers of random linear measurements. As in CS, the random measurements we propose can be used to recover the original data in R N. Moreover, like the fundamental bound in CS, our requisite M is linear in the “information level” K and logarithmic in the ambient dimension N; we also identify a logarithmic dependence on the volume and conditioning of the manifold. In addition to recovering faithful approximations to manifoldmodeled signals, however, the random projections we propose can also be used to discern key properties about the manifold. We discuss connections and contrasts with existing techniques in manifold learning, a setting where dimensionality reducing mappings are typically nonlinear and constructed adaptively from a set of sampled training data.
Y.: Sparse feature learning for deep belief networks
 In: Advances in Neural Information Processing Systems (NIPS 2007
, 2007
"... Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the ..."
Abstract

Cited by 130 (14 self)
 Add to MetaCart
(Show Context)
Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the tradeoff between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, highorder dependencies between the input observed variables can be captured. 1
Global Coordination of Local Linear Models
 Advances in Neural Information Processing Systems 14
, 2002
"... High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifoldarguably an important goal of unsupervised learning. In this paper, we show how ..."
Abstract

Cited by 88 (2 self)
 Add to MetaCart
(Show Context)
High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifoldarguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the "global coordination " of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model's parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifoldeven when the path crosses the domains of many different local models. The regularizer takes the form of a KullbackLeibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.
Face recognition based on image sets
 IEEE Conference on Computer Vision and Pattern Recognition
, 2010
"... We introduce a novel method for face recognition from image sets. In our setting each test and training example is a set of images of an individual’s face, not just a single image, so recognition decisions need to be based on comparisons of image sets. Methods for this have two main aspects: the mod ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
(Show Context)
We introduce a novel method for face recognition from image sets. In our setting each test and training example is a set of images of an individual’s face, not just a single image, so recognition decisions need to be based on comparisons of image sets. Methods for this have two main aspects: the models used to represent the individual image sets; and the similarity metric used to compare the models. Here, we represent images as points in a linear or affine feature space and characterize each image set by a convex geometric region (the affine or convex hull) spanned by its feature points. Set dissimilarity is measured by geometric distances (distances of closest approach) between convex models. To reduce the influence of outliers we use robust methods to discard input points that are far from the fitted model. The kernel trick allows the approach to be extended to implicit feature mappings, thus handling complex and nonlinear manifolds of face images. Experiments on two public face datasets show that our proposed methods outperform a number of existing stateoftheart ones. 1.
Merging and Splitting Eigenspace Models
, 2000
"... We present new deterministic methods that given two eigenspace models, each representing a set of ndimensional observations will: (1) merge the models to yield a representation of the union of the sets; (2) split one model from another to represent the difference between the sets; as this is done, ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
We present new deterministic methods that given two eigenspace models, each representing a set of ndimensional observations will: (1) merge the models to yield a representation of the union of the sets; (2) split one model from another to represent the difference between the sets; as this is done, we accurately keep track of the mean.