Results 1 - 10
of
39
The Visual Analysis of Human Movement: A Survey
- Computer Vision and Image Understanding
, 1999
"... The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people ” is currently one of the most active application domains in compu ..."
Abstract
-
Cited by 456 (7 self)
- Add to MetaCart
The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people ” is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions. c ○ 1999 Academic Press 1.
Analyzing Facial Expressions for Virtual Conferencing
- IEEE COMPUTER GRAPHICS & APPLICATIONS,
, 1998
"... ..."
On-line retrainable neural networks: improving the performance of neural networks in image analysis problems
- IEEE Trans. Neural Networks
, 2000
"... Abstract—A novel approach is presented in this paper for improving the performance of neural-network classifiers in image recognition, segmentation, or coding applications, based on a retraining procedure at the user level. The procedure includes: 1) a training algorithm for adapting the network wei ..."
Abstract
-
Cited by 40 (29 self)
- Add to MetaCart
Abstract—A novel approach is presented in this paper for improving the performance of neural-network classifiers in image recognition, segmentation, or coding applications, based on a retraining procedure at the user level. The procedure includes: 1) a training algorithm for adapting the network weights to the current condition; 2) a maximum a posteriori (MAP) estimation procedure for optimally selecting the most representative data of the current environment as retraining data; and 3) a decision mechanism for determining when network retraining should be activated. The training algorithm takes into consideration both the former and the current network knowledge in order to achieve good generalization. The MAP estimation procedure models the network output as a Markov random field (MRF) and optimally selects the set of training inputs and corresponding desired outputs. Results are presented which illustrate the theoretical developments as well as the performance of the proposed approach in real-life experiments. Index Terms—Image analysis, MPEG-4, neural-network retraining, segmentation, weight adaptation.
Pose-invariant face recognition using a 3D deformable model
, 2003
"... The paper proposes a novel, pose-invariant face recogI#TfA system based on a deformable,geform 3D face model, that is a composite of: (1) anedg model, (2) a color regrf model and (3) a wireframe model for jointlydescribing the shape and important features of the face. The #rst two submodels ar ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
The paper proposes a novel, pose-invariant face recogI#TfA system based on a deformable,geform 3D face model, that is a composite of: (1) anedg model, (2) a color regrf model and (3) a wireframe model for jointlydescribing the shape and important features of the face. The #rst two submodels are used forimag analysis and the third mainly for face synthesis. In order to match the model to faceimagy in arbitrary poses, the 3D model can be projected onto di#erent 2D viewplanes based on rotation, translation and scale parameters, therebygrebyf:Ik multipleface-imag templates (in di#erent sizes and orientations). Face shape variationsamong people are taken into account by the deformation parameters of the model. Given an unknown face, its pose is estimated by modelmatching and the system synthesizes faceimagj of known subjects in the same pose. The face is then classi#ed as the subject whose synthesizedimag is most similar. The synthesizedimagh are gref#k#j using a 3D face representation scheme which encodes the 3D shape and texture characteristics of the faces. This face representation is automatically derived fromtraining faceimag: of the subject. Experimental results show that the method is capable ofdetermining pose and recog##fA: faces accurately over a wide rang ofposes and with naturallyvarying liging conditions. Recogions. rates of92.3% have been achieved by the method with 10training faceimagk per person.
Occlusion-adaptive, content-based mesh design and forward tracking
- IEEE Transactions onImage Processing6
, 1997
"... Abstract—Two-dimensional (2-D) mesh-based motion compensation preserves neighboring relations (through connectivity of the mesh) as well as allowing warping transformations between pairs of frames; thus, it effectively eliminates blocking artifacts that are common in motion compensation by block mat ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract—Two-dimensional (2-D) mesh-based motion compensation preserves neighboring relations (through connectivity of the mesh) as well as allowing warping transformations between pairs of frames; thus, it effectively eliminates blocking artifacts that are common in motion compensation by block matching. However, available 2-D mesh models, whether uniform or nonuniform, enforce connectivity everywhere within a frame, which is clearly not suitable across occlusion boundaries. To this effect, we hereby propose an occlusion-adaptive forward-tracking mesh model, where connectivity of the mesh elements (patches) across covered and uncovered region boundaries are broken. This is achieved by allowing no node points within the background to be covered (BTBC) and refining the mesh structure within the model failure (MF) region(s) at each frame. The proposed content-based mesh structure enables better rendition of the motion (compared to a uniform or a hierarchical mesh), while tracking is necessary to avoid transmission of all node locations at each frame. Experimental results show successful motion compensation and tracking. Index Terms — Mesh refinement, occlusion-adaptive forward tracking, 2-D content-based mesh design.
A Robust Model-Based Approach for 3D Head Tracking in Video Sequences
, 2000
"... We present a generic and robust method for model-based global 3D head pose estimation in monocular and non-calibrated video sequences. The proposed method relies on a 3D/2D matching between 2D image features estimated throughout the sequence and 3D object features of a generic head model. Specifical ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We present a generic and robust method for model-based global 3D head pose estimation in monocular and non-calibrated video sequences. The proposed method relies on a 3D/2D matching between 2D image features estimated throughout the sequence and 3D object features of a generic head model. Specifically, it combines motion and texture features in an iterative optimization procedure based on the downhill simplex algorithm. A proper initialization of the pose parameters, based on a block matching procedure, is performed at each frame in order to take into account large amplitude motions. For the same reason, we have developed a non-linear optical flow-based interpolation algorithm for increasing the frame rate. Experiments demonstrate that this method is stable over extended sequences including large head motions, occlusions, various head postures and lighting variations. The estimation accuracy is related to the head model, as established by using an ellipsoidal model and an ad hoc synthe...
Closed-form connectivity-preserving solutions for motion compensation using 2-d meshes
- IEEE Trans. Image Processing
, 1997
"... Abstract — Motion compensation using two-dimensional (2-D) mesh models requires computation of the parameters of a spatial transformation for each mesh element (patch). It is well known that the parameters of an affine (bilinear or perspective) mapping can be uniquely estimated from three (four) poi ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract — Motion compensation using two-dimensional (2-D) mesh models requires computation of the parameters of a spatial transformation for each mesh element (patch). It is well known that the parameters of an affine (bilinear or perspective) mapping can be uniquely estimated from three (four) point correspondences (at the vertices of a triangular or quadrilateral mesh element). On the other hand, overdetermined solutions using more than the required minimum number of point correspondences provide increased robustness against correspondence-estimation errors; however, this necessitates special consideration to preserve mesh-connectivity. This paper presents closed-form, overdetermined solutions for least squares estimation of affine motion parameters for a triangular mesh, which preserve mesh-connectivity using patch-based or node-based connectivity constraints. In particular, four new algorithms are presented: patch-constrained methods using point correspondences or spatio-temporal intensity gradients, and node-constrained methods using point correspondences or spatio-temporal intensity gradients. The methods using point correspondences can be viewed as postprocessing of a dense motion field for best representation in terms of a set of irregularly spaced samples. The methods that are based on spatio-temporal intensity gradients offer closed-form solutions for direct estimation of the best node-point motion vectors (equivalently the best transformation parameters). We show that the performance of the proposed closed-form solutions are comparable to those of the alternative search-based solutions at a fraction of the computational cost. Index Terms—Closed-form least squares solution, connectivity constraints, motion compensation, texture mapping, 2-D meshbased motion representation. I.
Estimation of Eye, Eyebrow and Nose Features in Videophone Sequences
- in Proc. International Workshop on Very Low Bitrate Video Coding
, 1998
"... Automatic estimation of facial features is necessary for model--based coding of videophone sequences at very low bit rates. In this contribution, algorithms for the estimation of eye, eyebrow and nose features are presented. For estimation of eye features, deformable template matching with a simplif ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Automatic estimation of facial features is necessary for model--based coding of videophone sequences at very low bit rates. In this contribution, algorithms for the estimation of eye, eyebrow and nose features are presented. For estimation of eye features, deformable template matching with a simplified cost function is used. For the estimation of eyebrow features, a segmentation algorithm is proposed. Eyebrows covered by hair are taken into account by this algorithm. As nose features, the nostrils as well as the sides of the nose are estimated. Applied to the test sequences Akiyo and Miss America (CIF,10 Hz), the facial features are estimated with subjectively high accuracy. 1. Introduction The automatic estimation of facial features like eyes, mouth, eyebrows, nose and face contours is necessary for many applications [1][2]. For example, facial feature estimation is applied to face recognition [1] and model--based coding of videophone sequences at very low bit rates [3][4][5]. This ...
Lossy to Lossless Object-Based Coding of 3-D MRI Data
- IEEE Trans. on Image Processing
, 2002
"... We propose a fully three-dimensional (3-D) object -based coding system exploiting the diagnostic relevance of the different regions of the volumetric data for rate allocation. The data are first decorrelated via a 3-D discrete wavelet transform. The implementation via the lifting steps scheme allows ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We propose a fully three-dimensional (3-D) object -based coding system exploiting the diagnostic relevance of the different regions of the volumetric data for rate allocation. The data are first decorrelated via a 3-D discrete wavelet transform. The implementation via the lifting steps scheme allows to map integer-to-integer values, enabling lossless coding, and facilitates the definition of the object-based inverse transform. The coding process assigns disjoint segments of the bitstream to the different objects, which can be independently accessed and reconstructed at any up-to-lossless quality. Two fully 3-D coding strategies are considered: embedded zerotree coding (EZW-3D) and multidimensional layered zero coding (MLZC), both generalized for region of interest (ROI)-based processing. In order to avoid artifacts along region boundaries, some extra coefficients must be encoded for each object. This gives rise to an overheading of the bitstream with respect to the case where the volume is encoded as a whole. The amount of such extra information depends on both the filter length and the decomposition depth. The system is characterized on a set of head magnetic resonance images. Results show that MLZC and EZW-3D have competitive performances. In particular, the best MLZC mode outperforms the others state-of-the-art techniques on one of the datasets for which results are available in the literature.
Motion-based analysis and segmentation of image sequences using 3-d scene models
- Signal Processing
, 1998
"... In this paper we present an algorithm for automatic extraction and tracking of multiple objects from a video sequence. Our approach is model-based in the sense that we rst use a robust structure-from-motion algorithm to identify multiple objects and to recover initial 3-D shape models. Then, these m ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper we present an algorithm for automatic extraction and tracking of multiple objects from a video sequence. Our approach is model-based in the sense that we rst use a robust structure-from-motion algorithm to identify multiple objects and to recover initial 3-D shape models. Then, these models are used to identify and track the objects over multiple frames of the video sequence. The procedure starts with recovering a dense depth map of the scene using two frames at the beginning of the sequence, and representing the scene as a 3-D wire-frame computed from the depth map. Texture extracted from the video frames is mapped onto the model. Once the initial models are available we use a linear and low complexity algorithm to recover the motion parameters and scene structure of the objects for the subsequent frames. Combining the new estimates of depth and the initially computed 3-D models into an unstructured set of 3-D points with associated color information, we obtain updates of the 3-D scene description for each additional frame. We show that the usage of a 3-D scene model is suitable to analyze complex scenes with several objects. In our experimental results, we apply the approach presented in this paper to the problem of video sequence segmentation, object tracking, and video object plane (VOP) generation. We separate the video sequences into di erent layers of depth and combine the information from multiple frames to a compact and complete description of these layers. 1 1

