Results 1 - 10
of
51
Using Discriminant Eigenfeatures for Image Retrieval
, 1996
"... This paper describes the automatic selection of features from an image training set using the theories of multi-dimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for view-based class retrieval ..."
Abstract
-
Cited by 329 (12 self)
- Add to MetaCart
This paper describes the automatic selection of features from an image training set using the theories of multi-dimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for view-based class retrieval from a large database of widely varying real-world objects presented as "well-framed" views, and compare it with that of the principal component analysis.
Surface Reconstruction by Voronoi Filtering
- Discrete and Computational Geometry
, 1998
"... We give a simple combinatorial algorithm that computes a piecewise-linear approximation of a smooth surface from a finite set of sample points. The algorithm uses Voronoi vertices to remove triangles from the Delaunay triangulation. We prove the algorithm correct by showing that for densely sampled ..."
Abstract
-
Cited by 273 (11 self)
- Add to MetaCart
We give a simple combinatorial algorithm that computes a piecewise-linear approximation of a smooth surface from a finite set of sample points. The algorithm uses Voronoi vertices to remove triangles from the Delaunay triangulation. We prove the algorithm correct by showing that for densely sampled surfaces, where density depends on "local feature size", the output is topologically valid and convergent (both pointwise and in surface normals) to the original surface. We describe an implementation of the algorithm and show example outputs. 1 Introduction The problem of reconstructing a surface from scattered sample points arises in many applications such as computer graphics, medical imaging, and cartography. In this paper we consider the specific reconstruction problem in which the input is a set of sample points S drawn from a smooth two-dimensional manifold F embedded in three dimensions, and the desired output is a triangular mesh with vertex set equal to S that faithfully represen...
A mixed-state Condensation tracker with automatic model-switching
, 1998
"... There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random s ..."
Abstract
-
Cited by 135 (10 self)
- Add to MetaCart
There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random sampling methods to allow automatic switching between multiple motion models as a natural extension of the tracking process. The Bayesian mixed-state framework is described in its generality, and the example of a bouncing ball is used to demonstrate that a mixed-state model can significantly improve tracking performance in heavy clutter. The relevance of the approach to the problem of gesture recognition is then investigated using a tracker which is able to follow the natural drawing action of a hand holding a pen, and switches state according to the hand's motion. 1 Introduction There is considerable interest in the computer vision community in representing and modelling motion [1, 3, 4]. ...
Inferring 3d body pose from silhouettes using activity manifold learning
- In CVPR
, 2004
"... We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration ( ..."
Abstract
-
Cited by 108 (11 self)
- Add to MetaCart
We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration (pose) from the visual input (silhouette), we explicitly learn view-based representations of activity manifolds as well as learn mapping functions between such central representations and both the visual input space and the 3D body pose space. The body pose can be recovered in a closed form in two steps by projecting the visual input to the learned representations of the activity manifold, i.e., finding the point on the learned manifold representation corresponding to the visual input, followed by interpolating 3D pose. 1.
Generative Modeling for Continuous Non-Linearly Embedded Visual Inference
- In ICML
, 2004
"... Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over high-dimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly inte ..."
Abstract
-
Cited by 61 (11 self)
- Add to MetaCart
Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over high-dimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly integrated in order to reshape an initially designed representation space. Nonlinearities, inherent sparsity of high-dimensional training sets, and lack of global continuity makes dimensionality reduction challenging and lowdimensional search inefficient. To address these problems, we present a learning and inference algorithm that restricts visual tracking to automatically extracted, non-linearly embedded, lowdimensional spaces. This formulation produces a layered generative model with reduced state representation, that can be estimated using efficient continuous optimization methods. Our prior flattening method allows a simple analytic treatment of low-dimensional intrinsic curvature constraints, and allows consistent interpolation operations.
Mixed memory Markov models: decomposing complex stochastic processes as mixtures of simpler ones
, 1998
"... . We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination---or mixture---of simpler dynamical models. The parameters in these models admit a simple ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
. We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination---or mixture---of simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an Expectation-Maximization (EM) procedure. We derive a set of generalized Baum-Welch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes. Keywords: Markov models, mixture models, discrete time series 1. Introduction The modeling of time series is a fundamental problem in machine learning, with widespread applications. These include speech recognition (Rabiner, 1989), natu...
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications
- Proc. European Conference on Computer Vision, volume II of Lecture Notes in Computer Science
, 1996
"... . In Proc. European Conf. Computer Vision, pp. 376--387, 1996, Cambridge, UK Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and part ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
. In Proc. European Conf. Computer Vision, pp. 376--387, 1996, Cambridge, UK Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and parts of faces in real-time without special hardware. This paper describes a real-time lip tracker that uses a Kalman filter based dynamic contour to track the outline of the lips. Two alternative lip trackers, one that tracks lips from a profile view and the other from a frontal view, were developed to extract visual speech recognition features from the lip contour. In both cases, visual features have been incorporated into an acoustic automatic speech recogniser. Tests on small isolated-word vocabularies using a dynamic time warping based audio-visual recogniser demonstrate that real-time, contour-based lip tracking can be used to supplement acoustic-only speech recognisers enabling robust re...
Hierarchical Discriminant Analysis for Image Retrieval
- IEEE Trans. PAMI
, 1999
"... Abstract—A self-organizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The Self-Organizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automati ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Abstract—A self-organizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The Self-Organizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automatic optimal feature derivation and a hierarchical structure to achieve a logarithmic retrieval complexity. A Space-Tessellation Tree is automatically generated using the Most Expressive Features (MEFs) and the Most Discriminating Features (MDFs) at each level of the tree. The major characteristics of the proposed hierarchical discriminant analysis include: 1) avoiding the limitation of global linear features (hyperplanes as separators) by deriving a recursively better-fitted set of features for each of the recursively subdivided sets of training samples; 2) generating a smaller tree whose cell boundaries separate the samples along the class boundaries better than the principal component analysis, thereby giving a better generalization capability (i.e., better recognition rate in a disjoint test); 3) accelerating the retrieval using a tree structure for data pruning, utilizing a different set of discriminant features at each level of the tree. We allow for perturbations in the size and position of objects in the images through learning. We demonstrate the technique on a large image database of widely varying real-world objects taken in natural settings, and show the applicability of the approach for variability in position, size, and 3D orientation. This paper concentrates on the hierarchical partitioning of the feature spaces. Index Terms—Principal component analysis, discriminant analysis, hierarchical image database, image retrieval, tessellation, partitioning, object recognition, face recognition, complexity with large image databases.
Visual Speech Recognition Using Active Shape Models And Hidden Markov Models
- PROC. IEEE INT. CONF. ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING
, 1996
"... This paper describes a novel approach for visual speech recognition. The shape of the mouth is modelled by an Active Shape Model which is derived from the statistics of a training set and used to locate, track and parameterise the speaker's lip movements. The extracted parameters representing the li ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
This paper describes a novel approach for visual speech recognition. The shape of the mouth is modelled by an Active Shape Model which is derived from the statistics of a training set and used to locate, track and parameterise the speaker's lip movements. The extracted parameters representing the lip shape are modelled as continuous probability distributions and their temporal dependencies are modelled by Hidden Markov Models. We present recognition tests performed on a database of a broad variety of speakers and illumination conditions. The system achieved an accuracy of 85.42 % for a speaker independent recognition task of the first four digits using lip shape information only.

