Results 1  10
of
59
ShapefromSilhouette Across Time  Part I: Theory and Algorithms
 International Journal of Computer Vision
, 2005
"... ShapeFromSilhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time in ..."
Abstract

Cited by 100 (3 self)
 Add to MetaCart
ShapeFromSilhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two humanrelated applications: (1) the acquisition of detailed human kinematic models and (2) markerless motion tracking.
Motion Layer Extraction in the Presence of Occlusion Using Graph Cuts
, 2005
"... Extracting layers from video is very important for video representation, analysis, compression, and synthesis. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust and novel approach to automatically extract a set of affine or projective tra ..."
Abstract

Cited by 95 (9 self)
 Add to MetaCart
Extracting layers from video is very important for video representation, analysis, compression, and synthesis. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, detect the occlusion pixels over multiple consecutive frames, and segment the scene into several motion layers. First, after determining a number of seed regions using correspondences in two frames, we expand the seed regions and reject the outliers employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, an occlusion order constraint on multiple frames is explored, which enforces that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then, the correct layer segmentation is obtained by using a graph cuts algorithm and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust.
SpatioTemporal Segmentation of Video by Hierarchical Mean Shift Analysis
 Center for Automat. Res., U. of Md, College Park
, 2002
"... We describe a simple new technique for spatiotemporal segmentation of video sequences. Each pixel of a 3D spacetime video stack is mapped to a 7D feature point whose coordinates include three color components, two motion angle components and two motion position components. The clustering of these ..."
Abstract

Cited by 83 (4 self)
 Add to MetaCart
(Show Context)
We describe a simple new technique for spatiotemporal segmentation of video sequences. Each pixel of a 3D spacetime video stack is mapped to a 7D feature point whose coordinates include three color components, two motion angle components and two motion position components. The clustering of these feature points provides color segmentation and motion segmentation, as well as a consistent labeling of regions over time which amounts to region tracking. For this task we have adopted a hierarchical clustering method which operates by repeatedly applying mean shift analysis over increasing large ranges, using at each pass the cluster centers of the previous pass, with weights equal to the counts of the points that contributed to the clusters. This technique has lower complexity for large mean shift radii than regular mean shift analysis because it can use binary tree structures more efficiently during range search. In addition, it provides a hierarchical segmentation of the data. Applications include video compression and compact descriptions of video sequences for video indexing and retrieval applications.
Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming
 IEEE CONF. COMPUTER VISION AND PATTERN RECOGNITION
, 2005
"... Matrix factorization has many applications in computer vision. Singular Value Decomposition (SVD) is the standard algorithm for factorization. When there are outliers and missing data, which often happen in real measurements, SVD is no longer applicable. For robustness Iteratively Reweighted Least ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
(Show Context)
Matrix factorization has many applications in computer vision. Singular Value Decomposition (SVD) is the standard algorithm for factorization. When there are outliers and missing data, which often happen in real measurements, SVD is no longer applicable. For robustness Iteratively Reweighted Least Squares (IRLS) is often used for factorization by assigning a weight to each element in the measurements. Because it uses L2 norm, good initialization in IRLS is critical for success, but is nontrivial. In this paper, we formulate matrix factorization as a L1 norm minimization problem that is solved efficiently by alternative convex programming. Our formulation 1) is robust without requiring initial weighting, 2) handles missing data straightforwardly, and 3) provides a framework in which constraints and prior knowledge (if available) can be conveniently incorporated. In the experiments we apply our approach to factorizationbased structure from motion. It is shown that our approach achieves better results than other approaches (including IRLS) on both synthetic and real data.
A Unified Algebraic Approach to 2D and 3D Motion Segmentation
 IN EUROPEAN CONFERENCE ON COMPUTER VISION
, 2004
"... We present an analytic solution to the problem of estimating multiple 2D and 3D motion models from twoview correspondences or optical flow. The key to our approach is to view the estimation of multiple motion models as the estimation of a single multibody motion model. This is possible thanks ..."
Abstract

Cited by 50 (16 self)
 Add to MetaCart
(Show Context)
We present an analytic solution to the problem of estimating multiple 2D and 3D motion models from twoview correspondences or optical flow. The key to our approach is to view the estimation of multiple motion models as the estimation of a single multibody motion model. This is possible thanks to two important algebraic facts. First, we show that all the image measurements, regardless of their associated motion model, can be fit with a real or complex polynomial. Second, we show
Popup light field: An interactive imagebased modeling and rendering system
 ACM Transaction of Graphics
, 2004
"... In this article, we present an imagebased modeling and rendering system, which we call popup light field, that models a sparse light field using a set of coherent layers. In our system, the user specifies how many coherent layers should be modeled or popped up according to the scene complexity. A ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
In this article, we present an imagebased modeling and rendering system, which we call popup light field, that models a sparse light field using a set of coherent layers. In our system, the user specifies how many coherent layers should be modeled or popped up according to the scene complexity. A coherent layer is defined as a collection of corresponding planar regions in the light field images. A coherent layer can be rendered free of aliasing all by itself, or against other background layers. To construct coherent layers, we introduce a Bayesian approach, coherence matting, to estimate alpha matting around segmented layer boundaries by incorporating a coherence prior in order to maintain coherence across images. We have developed an intuitive and easytouse user interface (UI) to facilitate popup light field construction. The key to our UI is the concept of humanintheloop where the user specifies where aliasing occurs in the rendered image. The user input is reflected in the input light field images where popup layers can be modified. The user feedback is instant through a hardwareaccelerated realtime popup light field renderer. Experimental results demonstrate that our system is capable of rendering antialiased novel views from a sparse light field.
A Survey of SpatioTemporal Grouping Techniques
, 2002
"... Spatiotemporal segmentation of video sequences attempts to extract backgrounds and independent objects in the dynamic scenes captured in the sequences. It is an essential step of video analysis. It has important applications in video coding, video logging, indexing and retrieval, and more generally ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Spatiotemporal segmentation of video sequences attempts to extract backgrounds and independent objects in the dynamic scenes captured in the sequences. It is an essential step of video analysis. It has important applications in video coding, video logging, indexing and retrieval, and more generally in scene interpretation and video understanding. We classify spatiotemporal grouping techniques into three categories: (1) segmentation with spatial priority, (2) segmentation by trajectory grouping, and (3) joint spatial and temporal segmentation. The first category is the broadest, as it inherits the legacy techniques of image segmentation and motion segmentation. The other two categories place a higher priority on the accumulation of evidence along the temporal dimension and are more recent developments made feasible by the increased availability of computing power. For each category we provide a taxonomy of the techniques used to produce meaningful pixel groupings.
A layered stereo matching algorithm using image segmentation and global visibility constraints
, 2005
"... ..."
(Show Context)
Occlusion boundaries from motion: Lowlevel detection and midlevel reasoning
 International Journal of Computer Vision
, 2009
"... Abstract The boundaries of objects in an image are often considered a nuisance to be “handled ” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different ph ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
Abstract The boundaries of objects in an image are often considered a nuisance to be “handled ” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from lowlevel processing techniques to highlevel reasoning tasks. While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relativemotion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues ’ utility in detecting occlusion boundaries locally. We also present a novel, midlevel model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.
Efficient Computation of Robust LowRank Matrix Approximations in the Presence of Missing Data using the L1 Norm
"... The calculation of a lowrank approximation of a matrix is a fundamental operation in many computer vision applications. The workhorse of this class of problems has long been the Singular Value Decomposition. However, in the presence of missing data and outliers this method is not applicable, and un ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
The calculation of a lowrank approximation of a matrix is a fundamental operation in many computer vision applications. The workhorse of this class of problems has long been the Singular Value Decomposition. However, in the presence of missing data and outliers this method is not applicable, and unfortunately, this is often the case in practice. In this paper we present a method for calculating the lowrank factorization of a matrix which minimizes the L1 norm in the presence of missing data. Our approach represents a generalization the Wiberg algorithm of one of the more convincing methods for factorization under the L2 norm. By utilizing the differentiability of linear programs, we can extend the underlying ideas behind this approach to include this class of L1 problems as well. We show that the proposed algorithm can be efficiently implemented using existing optimization software. We also provide preliminary experiments on synthetic as well as real world data with very convincing results. 1.