Using Discriminant Eigenfeatures for Image Retrieval
, 1996
"... This paper describes the automatic selection of features from an image training set using the theories of multidimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for viewbased class retrieval ..."
This paper describes the automatic selection of features from an image training set using the theories of multidimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for viewbased class retrieval from a large database of widely varying realworld objects presented as "wellframed" views, and compare it with that of the principal component analysis.
Surface Reconstruction by Voronoi Filtering
 Discrete and Computational Geometry
, 1998
"... We give a simple combinatorial algorithm that computes a piecewiselinear approximation of a smooth surface from a finite set of sample points. The algorithm uses Voronoi vertices to remove triangles from the Delaunay triangulation. We prove the algorithm correct by showing that for densely sampled ..."
We give a simple combinatorial algorithm that computes a piecewiselinear approximation of a smooth surface from a finite set of sample points. The algorithm uses Voronoi vertices to remove triangles from the Delaunay triangulation. We prove the algorithm correct by showing that for densely sampled surfaces, where density depends on "local feature size", the output is topologically valid and convergent (both pointwise and in surface normals) to the original surface. We describe an implementation of the algorithm and show example outputs. 1 Introduction The problem of reconstructing a surface from scattered sample points arises in many applications such as computer graphics, medical imaging, and cartography. In this paper we consider the specific reconstruction problem in which the input is a set of sample points S drawn from a smooth twodimensional manifold F embedded in three dimensions, and the desired output is a triangular mesh with vertex set equal to S that faithfully represen...
A mixedstate Condensation tracker with automatic modelswitching
, 1998
"... There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random s ..."
There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random sampling methods to allow automatic switching between multiple motion models as a natural extension of the tracking process. The Bayesian mixedstate framework is described in its generality, and the example of a bouncing ball is used to demonstrate that a mixedstate model can significantly improve tracking performance in heavy clutter. The relevance of the approach to the problem of gesture recognition is then investigated using a tracker which is able to follow the natural drawing action of a hand holding a pen, and switches state according to the hand's motion. 1 Introduction There is considerable interest in the computer vision community in representing and modelling motion [1, 3, 4]. ...
Inferring 3d body pose from silhouettes using activity manifold learning
 In CVPR
, 2004
"... We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration ( ..."
We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration (pose) from the visual input (silhouette), we explicitly learn viewbased representations of activity manifolds as well as learn mapping functions between such central representations and both the visual input space and the 3D body pose space. The body pose can be recovered in a closed form in two steps by projecting the visual input to the learned representations of the activity manifold, i.e., finding the point on the learned manifold representation corresponding to the visual input, followed by interpolating 3D pose. 1.
Generative Modeling for Continuous NonLinearly Embedded Visual Inference
 In ICML
, 2004
"... Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over highdimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly inte ..."
Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over highdimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly integrated in order to reshape an initially designed representation space. Nonlinearities, inherent sparsity of highdimensional training sets, and lack of global continuity makes dimensionality reduction challenging and lowdimensional search inefficient. To address these problems, we present a learning and inference algorithm that restricts visual tracking to automatically extracted, nonlinearly embedded, lowdimensional spaces. This formulation produces a layered generative model with reduced state representation, that can be estimated using efficient continuous optimization methods. Our prior flattening method allows a simple analytic treatment of lowdimensional intrinsic curvature constraints, and allows consistent interpolation operations.
Mixed memory Markov models: decomposing complex stochastic processes as mixtures of simpler ones
, 1998
"... . We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combinationor mixtureof simpler dynamical models. The parameters in these models admit a simple ..."
. We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combinationor mixtureof simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an ExpectationMaximization (EM) procedure. We derive a set of generalized BaumWelch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes. Keywords: Markov models, mixture models, discrete time series 1. Introduction The modeling of time series is a fundamental problem in machine learning, with widespread applications. These include speech recognition (Rabiner, 1989), natu...
RealTime Lip Tracking for AudioVisual Speech Recognition Applications
 Proc. European Conference on Computer Vision, volume II of Lecture Notes in Computer Science
, 1996
"... . In Proc. European Conf. Computer Vision, pp. 376387, 1996, Cambridge, UK Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of generalpurpose workstations it is now possible to track human faces and part ..."
. In Proc. European Conf. Computer Vision, pp. 376387, 1996, Cambridge, UK Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of generalpurpose workstations it is now possible to track human faces and parts of faces in realtime without special hardware. This paper describes a realtime lip tracker that uses a Kalman filter based dynamic contour to track the outline of the lips. Two alternative lip trackers, one that tracks lips from a profile view and the other from a frontal view, were developed to extract visual speech recognition features from the lip contour. In both cases, visual features have been incorporated into an acoustic automatic speech recogniser. Tests on small isolatedword vocabularies using a dynamic time warping based audiovisual recogniser demonstrate that realtime, contourbased lip tracking can be used to supplement acousticonly speech recognisers enabling robust re...
Hierarchical Discriminant Analysis for Image Retrieval
 IEEE Trans. PAMI
, 1999
"... Abstract—A selforganizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The SelfOrganizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automati ..."
Abstract—A selforganizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The SelfOrganizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automatic optimal feature derivation and a hierarchical structure to achieve a logarithmic retrieval complexity. A SpaceTessellation Tree is automatically generated using the Most Expressive Features (MEFs) and the Most Discriminating Features (MDFs) at each level of the tree. The major characteristics of the proposed hierarchical discriminant analysis include: 1) avoiding the limitation of global linear features (hyperplanes as separators) by deriving a recursively betterfitted set of features for each of the recursively subdivided sets of training samples; 2) generating a smaller tree whose cell boundaries separate the samples along the class boundaries better than the principal component analysis, thereby giving a better generalization capability (i.e., better recognition rate in a disjoint test); 3) accelerating the retrieval using a tree structure for data pruning, utilizing a different set of discriminant features at each level of the tree. We allow for perturbations in the size and position of objects in the images through learning. We demonstrate the technique on a large image database of widely varying realworld objects taken in natural settings, and show the applicability of the approach for variability in position, size, and 3D orientation. This paper concentrates on the hierarchical partitioning of the feature spaces. Index Terms—Principal component analysis, discriminant analysis, hierarchical image database, image retrieval, tessellation, partitioning, object recognition, face recognition, complexity with large image databases.
High resolution acquisition, learning and transfer of dynamic 3D facial expressions
 In Computer Graphics Forum
, 2004
"... Synthesis and retargeting of facial expressions is central to facial animation and often involves significant manual work in order to achieve realistic expressions, due to the difficulty of capturing high quality dynamic expression data. In this paper we address fundamental issues regarding the use ..."
Synthesis and retargeting of facial expressions is central to facial animation and often involves significant manual work in order to achieve realistic expressions, due to the difficulty of capturing high quality dynamic expression data. In this paper we address fundamental issues regarding the use of high quality dense 3D data samples undergoing motions at video speeds, e.g. human facial expressions. In order to utilize such data for motion analysis and retargeting, correspondences must be established between data in different frames of the same faces as well as between different faces. We present a data driven approach that consists of four parts: 1) High speed, high accuracy capture of moving faces without the use of markers, 2) Very precise tracking of facial motion using a multiresolution deformable mesh, 3) A unified low dimensional mapping of dynamic facial motion that can separate expression style, and 4) Synthesis of novel expressions as a combination of expression styles. The accuracy and resolution of our method allows us to capture and track subtle expression details. The low dimensional representation of motion data in a unified embedding for all the subjects in the database allows for learning the most discriminating characteristics of each individual’s expressions as that person’s “expression style”. Thus new expressions can be synthesized, either as dynamic morphing between individuals, or as expression transfer from a