Results 1 - 10
of
10
A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms
, 2006
"... This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo ..."
Abstract
-
Cited by 191 (12 self)
- Add to MetaCart
This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.
Fusion of Multi-View Silhouette Cues Using a Space Occupancy Grid
, 2005
"... In this paper, we investigate what can be inferred from several silhouette probability maps, in multi-camera environments. To this aim, we propose a new framework for multi-view silhouette cue fusion. This framework uses a space occupancy grid as a probabilistic 3D representation of scene contents. ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
In this paper, we investigate what can be inferred from several silhouette probability maps, in multi-camera environments. To this aim, we propose a new framework for multi-view silhouette cue fusion. This framework uses a space occupancy grid as a probabilistic 3D representation of scene contents. Such a representation is of great interest for various computer vision applications in perception, or localization for instance. Our main contribution is to introduce the occupancy grid concept, popular in the robotics community, for multi-camera environments. The idea is to consider each camera pixel as a statistical occupancy sensor. All pixel observations are then used jointly to infer where, and how likely, matter is present in the scene. As our results illustrate, this simple model has various advantages. Most sources of uncertainty are explicitly modeled, and no premature decisions about pixel labeling occur, thus preserving pixel knowledge. Consequently, optimal scene object localization, and robust volume reconstruction, can be achieved, with no constraint on camera placement and object visibility. In addition, this representation allows to improve silhouette extraction in images.
Scalable 3D Video Of dynamic scenes
, 2005
"... In this paper we present a scalable 3D video framework for capturing and rendering dynamic scenes. The acquisition system is based on multiple sparsely placed 3D video bricks, each comprising a projector, two grayscale cameras and a color camera. Relying on structured light with complementary patter ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper we present a scalable 3D video framework for capturing and rendering dynamic scenes. The acquisition system is based on multiple sparsely placed 3D video bricks, each comprising a projector, two grayscale cameras and a color camera. Relying on structured light with complementary patterns, texture images and patternaugmented views of the scene are acquired simultaneously by time multiplexed projections and synchronized camera exposures. Using space-time stereo on the acquired pattern images, high-quality depth maps are extracted, whose corresponding surface samples are merged into a viewindependent, point-based 3D data structure. This representation allows for effective photo consistency enforcement and outlier removal, leading to a significant decrease of visual artifacts and a high resulting rendering quality using EWA volume splatting. Our framework and its view-independent representation allow for simple and straightforward editing of 3D video. In order to demonstrate its flexibility, we show compositing techniques and spatio-temporal effects.
Multi-Object Shape Estimation and Tracking from Silhouette Cues
"... This paper deals with the 3D shape estimation from silhouette cues of multiple moving objects in general indoor or outdoor 3D scenes with potential static obstacles, using multiple calibrated video streams. Most shape-fromsilhouette techniques use a two-classification of space occupancy and silhouet ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper deals with the 3D shape estimation from silhouette cues of multiple moving objects in general indoor or outdoor 3D scenes with potential static obstacles, using multiple calibrated video streams. Most shape-fromsilhouette techniques use a two-classification of space occupancy and silhouettes, based on image regions that match or disagree with a static background appearance model. Binary silhouette information becomes insufficient to unambiguously carve 3D space regions as the number and density of dynamic objects increases. In such difficult scenes, multi-view stereo methods suffer from visibility problems, and rely on color calibration procedures tedious to achieve outdoors. We propose a new algorithm to automatically detect and reconstruct scenes with a variable number of dynamic objects. Our formulation distinguishes between m different shapes in the scene by using automatically learnt view-specific appearance models, eliminating the color calibration requirement. Bayesian reasoning is then applied to solve the m-shape occupancy problem, with m updated as objects enter or leave the scene. Results show that this method yields multiple silhouette-based estimates that drastically improve scene reconstructions over traditional twolabel silhouette scene analysis. This enables the method to also efficiently deal with multi-person tracking problems. 1.
Space Carving with a Hand-Held Camera
- Proceedings of the SIBGRAPI'2004 - International Symposium on Computer Graphics, Image Processing and Vision
, 2004
"... This paper presents a 3D scene reconstruction method, based on space carving, that works with a hand-held camera. In our system, the intrinsic and extrinsic parameters of the camera are determined at the moment of image capture, as opposed to other systems that rely on fixed pre-calibrated camera se ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a 3D scene reconstruction method, based on space carving, that works with a hand-held camera. In our system, the intrinsic and extrinsic parameters of the camera are determined at the moment of image capture, as opposed to other systems that rely on fixed pre-calibrated camera setups. In order to do this we place a special calibration pattern in the scene in such a way that it does not alter scene visibility. However, the calibration pattern may be partially occluded by the objects of interest in the scene. This has led us to adopt a calibration method based on model recognition. Scene reconstruction is obtained from the set of input images by an adaptive space-carving algorithm that uses not only photometric information but also segmentation information. The segmentation information of a given input image is determined by a robust statistical test based on an approximate model of the scene's background. Such model is computed from a set of images of the scene's background that are warped in such a way that they match the geometry of the desired camera.
Reconstructing Non-stationary Articulated Objects in Monocular Video using Silhouette Information
"... This paper presents an approach to reconstruct nonstationary, articulated objects from silhouettes obtained with a monocular video sequence. We introduce the concept of motion blurred scene occupancies, a direct analogy of motion blurred images but in a 3D object scene occupancy space resulting from ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents an approach to reconstruct nonstationary, articulated objects from silhouettes obtained with a monocular video sequence. We introduce the concept of motion blurred scene occupancies, a direct analogy of motion blurred images but in a 3D object scene occupancy space resulting from the motion/deformation of the object. Our approach starts with an image based fusion step that combines color and silhouette information from multiple views. To this end we propose to use a novel construct: the temporal occupancy point (TOP), which is the estimated 3D scene location of a silhouette pixel and contains information about duration of time it is occupied. Instead of explicitly computing the TOP in 3D space we directly obtain it’s imaged(projected) locations in each view. This enables us to handle monocular video and arbitrary camera motion in scenarios where complete camera calibration information may not be available. The result is a set of blurred scene occupancy images in the corresponding views, where the values at each pixel correspond to the fraction of total time duration that the pixel observed an occupied scene location. We then use a motion de-blurring approach to de-blur the occupancy images. The de-blurred occupancy images correspond to a silhouettes of the mean/motion compensated object shape and are used to obtain a visual hull reconstruction of the object. We show promising results on challenging monocular datasets of deforming objects where traditional visual hull intersection approaches fail to reconstruct the object correctly. 1.
A Semi-supervised Approach to Space Carving
"... In this paper, we present a semi-supervised approach to space carving by casting the recovery of volumetric data from multiple views into an evidence combining setting. The method presented here is statistical in nature and employs, as a starting point, a manually obtained contour. By making use of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we present a semi-supervised approach to space carving by casting the recovery of volumetric data from multiple views into an evidence combining setting. The method presented here is statistical in nature and employs, as a starting point, a manually obtained contour. By making use of this user-provided information, we obtain probabilistic silhouettes of all successive images. These silhouettes provide a prior distribution that is then used to compute the probability of a voxel being carved. This evidence combining setting allows us to make use of background pixel information. As a result, our method combines the advantages of shape-from-silhouette techniques and statistical space carving approaches. For the carving process, we propose a new voxelated space. The proposed space is a projective one that provides a color mapping for the object voxels which is consistent in terms of pixel coverage with their projection onto the image planes for the imagery under consideration. We provide quantitative results and illustrate the utility of the method on real-world imagery.
Probabilistic 3D Occupancy Flow with Latent Silhouette Cues
"... In this paper we investigate shape and motion retrieval in the context of multi-camera systems. We propose a new lowlevel analysis based on latent silhouette cues, particularly suited for low-texture and outdoor datasets. Our analysis does not rely on explicit surface representations, instead using ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we investigate shape and motion retrieval in the context of multi-camera systems. We propose a new lowlevel analysis based on latent silhouette cues, particularly suited for low-texture and outdoor datasets. Our analysis does not rely on explicit surface representations, instead using an EM framework to simultaneously update a set of volumetric voxel occupancy probabilities and retrieve a best estimate of the dense 3D motion field from the last consecutively observed multi-view frame set. As the framework uses only latent, probabilistic silhouette information, the method yields a promising 3D scene analysis method robust to many sources of noise and arbitrary scene objects. It can be used as input for higher level shape modeling and structural inference tasks. We validate the approach and demonstrate its practical use for shape and motion analysis experimentally. 1.
Multi-view Approaches to Tracking, 3D Reconstruction and Object Class Detection
, 2008
"... Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse inf ..."
Abstract
- Add to MetaCart
Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information from multiple views onto a reference view without explicit processing in 3D space, thereby circumventing the need for complete calibration. Our approach uses a homographic occupancy constraint (HOC), which states that if a foreground pixel has a piercing point that is occupied by foreground object, then the pixel warps to foreground regions in every view under homographies induced by the reference plane, in effect using cameras as occupancy detectors. Using the HOC we are able to resolve occlusions and robustly determine ground plane localizations of the people in the scene. To find tracks we obtain ground localizations over a window of frames and stack them creating a space time volume. Regions belonging to the same person form contiguous spatio-temporal

