Results 1 - 10
of
32
Motion layer extraction in the presence of occlusion using graph cut
- In CVPR (2
, 2004
"... Extracting layers from video is very important for video representation, analysis, compression, and synthesis. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust and novel approach to automatically extract a set of affine or projective tran ..."
Abstract
-
Cited by 57 (7 self)
- Add to MetaCart
Extracting layers from video is very important for video representation, analysis, compression, and synthesis. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, detect the occlusion pixels over multiple consecutive frames, and segment the scene into several motion layers. First, after determining a number of seed regions using correspondences in two frames, we expand the seed regions and reject the outliers employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, an occlusion order constraint on multiple frames is explored, which enforces that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Index Terms Layer-based motion segmentation, video analysis, graph cuts, level set representation, occlusion order constraint. I.
Spatio-Temporal Segmentation of Video by Hierarchical Mean Shift Analysis
- Center for Automat. Res., U. of Md, College Park
, 2002
"... We describe a simple new technique for spatio-temporal segmentation of video sequences. Each pixel of a 3D space-time video stack is mapped to a 7D feature point whose coordinates include three color components, two motion angle components and two motion position components. The clustering of these ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
We describe a simple new technique for spatio-temporal segmentation of video sequences. Each pixel of a 3D space-time video stack is mapped to a 7D feature point whose coordinates include three color components, two motion angle components and two motion position components. The clustering of these feature points provides color segmentation and motion segmentation, as well as a consistent labeling of regions over time which amounts to region tracking. For this task we have adopted a hierarchical clustering method which operates by repeatedly applying mean shift analysis over increasing large ranges, using at each pass the cluster centers of the previous pass, with weights equal to the counts of the points that contributed to the clusters. This technique has lower complexity for large mean shift radii than regular mean shift analysis because it can use binary tree structures more efficiently during range search. In addition, it provides a hierarchical segmentation of the data. Applications include video compression and compact descriptions of video sequences for video indexing and retrieval applications.
Shape-from-Silhouette Across Time - Part I: Theory and Algorithms
- International Journal of Computer Vision
, 2005
"... Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time in ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two human-related applications: (1) the acquisition of detailed human kinematic models and (2) marker-less motion tracking.
A layered stereo matching algorithm using image segmentation and global visibility constraints
, 2005
"... ..."
Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming
- IEEE CONF. COMPUTER VISION AND PATTERN RECOGNITION
, 2005
"... Matrix factorization has many applications in computer vision. Singular Value Decomposition (SVD) is the standard algorithm for factorization. When there are outliers and missing data, which often happen in real measurements, SVD is no longer applicable. For robustness Iteratively Re-weighted Least ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Matrix factorization has many applications in computer vision. Singular Value Decomposition (SVD) is the standard algorithm for factorization. When there are outliers and missing data, which often happen in real measurements, SVD is no longer applicable. For robustness Iteratively Re-weighted Least Squares (IRLS) is often used for factorization by assigning a weight to each element in the measurements. Because it uses L2 norm, good initialization in IRLS is critical for success, but is non-trivial. In this paper, we formulate matrix factorization as a L1 norm minimization problem that is solved efficiently by alternative convex programming. Our formulation 1) is robust without requiring initial weighting, 2) handles missing data straightforwardly, and 3) provides a framework in which constraints and prior knowledge (if available) can be conveniently incorporated. In the experiments we apply our approach to factorization-based structure from motion. It is shown that our approach achieves better results than other approaches (including IRLS) on both synthetic and real data.
A Survey of Spatio-Temporal Grouping Techniques
, 2002
"... Spatio-temporal segmentation of video sequences attempts to extract backgrounds and independent objects in the dynamic scenes captured in the sequences. It is an essential step of video analysis. It has important applications in video coding, video logging, indexing and retrieval, and more generally ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Spatio-temporal segmentation of video sequences attempts to extract backgrounds and independent objects in the dynamic scenes captured in the sequences. It is an essential step of video analysis. It has important applications in video coding, video logging, indexing and retrieval, and more generally in scene interpretation and video understanding. We classify spatio-temporal grouping techniques into three categories: (1) segmentation with spatial priority, (2) segmentation by trajectory grouping, and (3) joint spatial and temporal segmentation. The first category is the broadest, as it inherits the legacy techniques of image segmentation and motion segmentation. The other two categories place a higher priority on the accumulation of evidence along the temporal dimension and are more recent developments made feasible by the increased availability of computing power. For each category we provide a taxonomy of the techniques used to produce meaningful pixel groupings.
Pop-up light field: An interactive image-based modeling and rendering system
- ACM Transaction of Graphics
, 2004
"... In this article, we present an image-based modeling and rendering system, which we call pop-up light field, that models a sparse light field using a set of coherent layers. In our system, the user specifies how many coherent layers should be modeled or popped up according to the scene complexity. A ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this article, we present an image-based modeling and rendering system, which we call pop-up light field, that models a sparse light field using a set of coherent layers. In our system, the user specifies how many coherent layers should be modeled or popped up according to the scene complexity. A coherent layer is defined as a collection of corresponding planar regions in the light field images. A coherent layer can be rendered free of aliasing all by itself, or against other background layers. To construct coherent layers, we introduce a Bayesian approach, coherence matting, to estimate alpha matting around segmented layer boundaries by incorporating a coherence prior in order to maintain coherence across images. We have developed an intuitive and easy-to-use user interface (UI) to facilitate pop-up light field construction. The key to our UI is the concept of human-in-the-loop where the user specifies where aliasing occurs in the rendered image. The user input is reflected in the input light field images where pop-up layers can be modified. The user feedback is instant through a hardwareaccelerated real-time pop-up light field renderer. Experimental results demonstrate that our system is capable of rendering anti-aliased novel views from a sparse light field.
Local detection of occlusion boundaries in video
- In BMVC
, 2006
"... Occlusion boundaries are notoriously difficult for many patch-based computer vision algorithms, but they also provide potentially useful information about scene structure and shape. Using short video clips, we present a novel method for scoring the degree to which edges exhibit occlusion. We first u ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Occlusion boundaries are notoriously difficult for many patch-based computer vision algorithms, but they also provide potentially useful information about scene structure and shape. Using short video clips, we present a novel method for scoring the degree to which edges exhibit occlusion. We first utilize a spatio-temporal edge detector which estimates edge strength, orientation, and normal motion. By then extracting patches from either side of each detected (possibly moving) edglet, we can estimate and compare motion to determine if occlusion is present. This completely local, bottom-up approach is intended to provide powerful low-level information for use by higher-level reasoning methods. 1
Robust Subspace Clustering by Combined Use of kNND Metric and SVD Algorithm
- Metric and SVD Algorithm.” IEEE Conference on Computer Vision and Pattern Recognition
, 2004
"... vision, such as image/video segmentation and pattern classification. The major issue in subspace clustering is to obtain the most appropriate subspace from the given noisy data. Typical methods (e.g., SVD, PCA, and Eigendecomposition) use least squares techniques, and are sensitive to outliers. In t ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
vision, such as image/video segmentation and pattern classification. The major issue in subspace clustering is to obtain the most appropriate subspace from the given noisy data. Typical methods (e.g., SVD, PCA, and Eigendecomposition) use least squares techniques, and are sensitive to outliers. In this paper, we present the k-th Nearest Neighbor Distance (kNND) metric, which, without actually clustering the data, can exploit the intrinsic data cluster structure to detect and remove influential outliers as well as small data clusters. The remaining data provide a good initial inlier data set that resides in a linear subspace whose rank (dimension) is upper-bounded. Such linear subspace constraint can then be exploited by simple algorithms, such as iterative SVD algorithm, to (1) detect the remaining outliers that violate the correlation structure enforced by the low rank subspace, and (2) reliably compute the subspace. As an example, we apply our method to extracting layers from image sequences containing dynamically moving objects.
Generalized Principal Component Analysis (GPCA): an Algebraic . . .
, 2003
"... Generalized Principal Component Analysis (GPCA): an Algebraic Geometric Approach to Subspace Clustering and Motion Segmentation by Ren e Esteban Vidal Doctor of Philosophy in Engineering -- Electrical Engineering and Computer Sciences University of California at Berkeley Professor Shankar Sast ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Generalized Principal Component Analysis (GPCA): an Algebraic Geometric Approach to Subspace Clustering and Motion Segmentation by Ren e Esteban Vidal Doctor of Philosophy in Engineering -- Electrical Engineering and Computer Sciences University of California at Berkeley Professor Shankar Sastry, Chair Simultaneous data segmentation and model estimation refers to the problem of estimating a collection of models from sample data points, without knowing which points correspond to which model. This is a challenging problem in many disciplines, such as machine learning, computer vision, robotics and control, that is usually regarded as "chicken-and-egg". This is because if the segmentation of the data was known, one could easily fit a single model to each group of points. Conversely, if the models were known, one could easily find the data points that best fit each model. Since in practice neither the models nor the segmentation of the data are known, most of the existing approaches start with an initial estimate for the either the segmentation of the data or the model parameters and then iterate between data segmentation and model estimation. However, the convergence of iterative algorithms to the global optimum is in general very sensitive to initialization of both the number of models and the model parameters. Finding a good initialization remains a challenging problem.

