DMCA
A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. (2001)
Cached
Download Links
- [www.vision.caltech.edu]
- [www.cs.ucsb.edu]
- [research.microsoft.com]
- [research.microsoft.com]
- [community.middlebury.edu]
- [www.middlebury.edu]
- [vision.middlebury.edu]
- [research.microsoft.com]
- [research.microsoft.com]
- [www.middlebury.edu]
- [community.middlebury.edu]
- [www.middlebury.edu]
- [vision.middlebury.edu]
- [cat.middlebury.edu]
- [robots.stanford.edu]
- [www.csd.uwo.ca]
- [www.ittc.ku.edu]
- [www.cfar.umd.edu]
- [www0.cs.ucl.ac.uk]
- [pages.cs.wisc.edu]
- [www.cfar.umd.edu]
- [pages.cs.wisc.edu]
- [www.ittc.ku.edu]
- [www.cs.cornell.edu]
- [www.cs.cornell.edu]
- [www.cs.cornell.edu]
- [research.microsoft.com]
Venue: | In IEEE Workshop on Stereo and Multi-Baseline Vision, |
Citations: | 1546 - 22 self |
Citations
5115 | Stochastic Relaxation, Gibbs Distribution and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...oring pixels’ disparities, Esmooth(d) = ∑ (x,y) ρ(d(x, y) − d(x+1, y)) + ρ(d(x, y) − d(x, y+1)), (5) where ρ is some monotonically increasing function of disparity difference. (An alternative to smoothness functionals is to use a lower-dimensional representation such as splines [112].) In regularization-based vision [87], ρ is a quadratic function, which makes d smooth everywhere and may lead to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving and are based on robust ρ functions [119, 16, 97]. Geman and Geman’s seminal paper [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a robust regularization framework. The terms in Esmooth can also be made to depend on the intensity differences, e.g., ρd(d(x, y) − d(x+1, y)) · ρI(‖I(x, y) − I(x+1, y)‖), (6) where ρI is some monotonically decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, ... |
4675 | A Computational Approach to Edge Detection
- Canny
- 1986
(Show Context)
Citation Context ... regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in the various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching co... |
2896 | An iterative image registration technique with an application to stero vision
- Lucas, Kanade
- 1981
(Show Context)
Citation Context ...tion or people tracking, these may be perfectly adequate. However for image-based rendering, such quantized maps lead to very unappealing view synthesis results (the scene appears to be made up of many thin shearing layers). To remedy this situation, many algorithms apply a sub-pixel refinement stage after the initial discrete correspondence stage. (An alternative is to simply start with more discrete disparity levels.) Sub-pixel disparity estimates can be computed in a variety of ways, including iterative gradient descent and fitting a curve to the matching costs at discrete disparity levels [93, 71, 122, 77, 60]. This provides an easy way to increase the resolution of a stereo algorithm with little additional computation. However, to work well, the intensities being matched must vary smoothly, and the regions over which these estimates are computed must be on the same (correct) surface. Recently, some questions have been raised about the advisability of fitting correlation curves to integer-sampled matching costs [105]. This situation may even be worse when sampling-insensitive dissimilarity measures are used [12]. We investigate this issue in Section 6.4 below. Besides sub-pixel computations, there ... |
2120 | R.: Fast approximate energy minimization via graph cuts
- Boykov, Veksler, et al.
(Show Context)
Citation Context ...lation [51, 19] and the rank transform [129]. (This can also be viewed as a preprocessing step; see Section 3.1.) On the other hand, global algorithms make explicit smoothness assumptions and then solve an optimization problem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and ab... |
1608 | Vision: A Computational Investigation into the Human Representation and Processing of Visual Information - Marr - 1982 |
1388 | The laplacian pyramid as a compact image code
- Burt, Adelson
- 1983
(Show Context)
Citation Context ...indow-based aggregation insensitive to window size in terms of computation time and accounts for the fast performance seen in realtime matchers [59, 64]. Figure 3: Shiftable window. The effect of trying all 3 × 3 shifted windows around the black pixel is the same as taking the minimum matching score across all centered (non-shifted) windows in the same neighborhood. (Only 3 of the neighboring shifted windows are shown here for clarity.) • Binomial filter: use a separable FIR (finite impulse response) filter. We use the coefficients 1/16{1, 4, 6, 4, 1}, the same ones used in Burt and Adelson’s [26] Laplacian pyramid. Other convolution kernels could also be added later, as could recursive (bi-directional) IIR filtering, which is a very efficient way to obtain large window sizes [35]. The width of the box or convolution kernel is controlled by aggr window size. To simulate the effect of shiftable windows [2, 18, 117], we can follow this aggregation step with a separable square min-filter. The width of this filter is controlled by the parameter aggr minfilter. The cascaded effect of a box-filter and an equal-sized min-filter is the same as evaluating a complete set of shifted windows, sinc... |
1324 | Performance of optical flow techniques
- Barron, Fleet, et al.
- 1994
(Show Context)
Citation Context ...ing a complete survey of existing stereo methods, even restricted to dense two-frame methods, would be a formidable task, as a large number of new methods are published every year. It is also arguable whether such a survey would be of much value to other stereo researchers, besides being an obvious catch-all reference. Simply enumerating different approaches is unlikely to yield new insights. Clearly, a comparative evaluation is necessary to assess the performance of both established and new algorithms and to gauge the progress of the field. The publication of a similar study by Barron et al. [8] has had a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent... |
979 | A survey of image registration techniques
- Brown
- 1992
(Show Context)
Citation Context ...ound truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today’s best-performing stereo algorithms. 1. Introduction Stereo correspondence has traditionally been, and continues to be, one of the most heavily investigated topics in computer vision. However, it is sometimes hard to gauge progress in the field, as most researchers only report qualitative results on the performance of their algorithms. Furthermore, a survey of stereo methods is long overdue, with the last exhaustive surveys dating back about a decade [7, 37, 25]. This paper provides an update on the state of the art in the field, with particular emphasis on stereo methods that (1) operate on two frames under known camera geometry, and (2) produce a dense disparity map, i.e., a disparity estimate at each pixel. Our goals are two-fold: 1. To provide a taxonomy of existing stereo algorithms that allows the dissection and comparison of individual algorithm components design decisions; 2. To provide a test bed for the quantitative evaluation of stereo algorithms. Towards this end, we are placing sample implementations of correspondence algorithms along wi... |
662 | Hierarchical model-based motion estimation
- Bergen, Anandan, et al.
- 1992
(Show Context)
Citation Context ...tween these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during a... |
633 |
Multiple View Geometry.
- Hartley, Zisserman
- 2003
(Show Context)
Citation Context ...moothness assumptions (often implicit) without which the correspondence problem would be underconstrained and illposed. Our taxonomy of stereo algorithms, presented in Section 3, examines both matching assumptions and smoothness assumptions in order to categorize existing stereo methods. Finally, most algorithms make assumptions about camera calibration and epipolar geometry. This is arguably the bestunderstood part of stereo vision; we therefore assume in this paper that we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], tria... |
566 | A theory of shape by space carving.
- Kutulakos, Seitz
- 1998
(Show Context)
Citation Context ...at we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes... |
529 |
computational framework and an algorithm for the measurement of visual motion
- Anandan, “A
- 1989
(Show Context)
Citation Context ...97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) gro... |
467 | Photorealistic scene reconstruction by voxel coloring.
- Seitz, Dyer
- 1997
(Show Context)
Citation Context ...at we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes... |
455 | Layered depth images
- Shade, Gortler, et al.
- 1998
(Show Context)
Citation Context ...uted disparity map to a different (potentially unseen) view (Figure 5), and compare it against this new image to obtain a forward prediction error. 2. Inverse warp a new view by the computed disparity map to generate a stabilized image (Figure 6), and compare it against the reference image to obtain an inverse prediction error. There are pros and cons to either approach. The forward warping algorithm has to deal with tearing problems: if a single-pixel splat is used, gaps can arise even between adjacent pixels with similar disparities. One possible solution would be to use a two-pass renderer [102]. Instead, we render each pair of neighboring pixel as an interpolated color line in the destination image (i.e., we use Gouraud shading). If neighboring pixels differ by more that a disparity of eval disp gap, the segment is replaced by single pixel spats at both ends, which results in a visible tear (light magenta regions in Figure 5). For inverse warping, the problem of gaps does not occur. Instead, we get “ghosted” regions when pixels in the reference image are not actually visible in the source. We eliminate such pixels by checking for visibility (occlusions) first, and then drawing these... |
435 |
A theory of human stereo vision
- Marr, Poggio
- 1979
(Show Context)
Citation Context ...bstituting different matching costs? In this paper we attempt to answer such questions by providing a taxonomy of stereo algorithms. The taxonomy is designed to identify the individual components and design decisions that go into a published algorithm. We hope that the taxonomy will also serve to structure the field and to guide researchers in the development of new and better algorithms. 2.1. Computational theory Any vision algorithm, explicitly or implicitly, makes assumptions about the physical world and the image formation process. In other words, it has an underlying computational theory [74, 72]. For example, how does the algorithm measure the evidence that points in the two images match, i.e., that they are projections of the same scene point? One common assumption is that of Lambertian surfaces, i.e., surfaces whose appearance does not vary with viewpoint. Some algorithms also model specific kinds of camera noise, or differences in gain or bias. Equally important are assumptions about the world or scene geometry and the visual appearance of objects. Starting from the fact that the physical world consists of piecewise-smooth surfaces, algorithms have built-in smoothness assumptions ... |
400 | The Geometry of Multiple Images.
- Faugeras, Luong
- 2004
(Show Context)
Citation Context ...moothness assumptions (often implicit) without which the correspondence problem would be underconstrained and illposed. Our taxonomy of stereo algorithms, presented in Section 3, examines both matching assumptions and smoothness assumptions in order to categorize existing stereo methods. Finally, most algorithms make assumptions about camera calibration and epipolar geometry. This is arguably the bestunderstood part of stereo vision; we therefore assume in this paper that we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], tria... |
365 | Computing visual correspondence with occlusions using graph cuts.
- Kolmogorov, Zabih
- 2001
(Show Context)
Citation Context ..., 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization pr... |
340 | A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment,
- Kanade, Okutomi
- 1991
(Show Context)
Citation Context ...urations using a plane sweep algorithm [30, 113]. 3.2. Aggregation of cost Local and window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation... |
339 |
Cooperative computation of stereo disparity
- Marr, Poggio
- 1976
(Show Context)
Citation Context ...blem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is ... |
291 |
Computational vision and regularization theory
- Poggio, Torre, et al.
- 1985
(Show Context)
Citation Context ... C is the (initial or aggregated) matching cost DSI. The smoothness term Esmooth(d) encodes the smoothness assumptions made by the algorithm. To make the optimization computationally tractable, the smoothness term is often restricted to only measuring the differences between neighboring pixels’ disparities, Esmooth(d) = ∑ (x,y) ρ(d(x, y) − d(x+1, y)) + ρ(d(x, y) − d(x, y+1)), (5) where ρ is some monotonically increasing function of disparity difference. (An alternative to smoothness functionals is to use a lower-dimensional representation such as splines [112].) In regularization-based vision [87], ρ is a quadratic function, which makes d smooth everywhere and may lead to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving and are based on robust ρ functions [119, 16, 97]. Geman and Geman’s seminal paper [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a robust regularization framework. The term... |
270 | On the Unification of Line Processes, Outlier Rejection, and Robust Statistics With Applications in Early Vision
- Black, Rangarajan
- 1996
(Show Context)
Citation Context ...lts from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) ground-truth disparities; (c–e) three (x, y) slices for d = 10, 16, 21; (e) an (x, d) slice for y = 151 (the dashed line in Figure (b)). Different dark (matching) regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in ... |
262 | Stereo by intra- and interscanline search using dynamic programming
- Ohta, Kanade
- 1985
(Show Context)
Citation Context ...ds have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several... |
259 | Kalman Filter-based Algorithms for Estimating Depth from Image Sequences (Tech.
- Matthies, Szeliski, et al.
- 1988
(Show Context)
Citation Context ...97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) gro... |
258 | A Maximum-Flow Formulation of the N-Camera Stereo Correspondence Problem,”
- Roy, Cox
- 1998
(Show Context)
Citation Context ..., 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization pr... |
253 | Epipolar-plane image analysis: an approach to determining structure from motion,
- Bolles, Baker, et al.
- 1987
(Show Context)
Citation Context ...ge number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes [72]. (Horizontal disparity is the most commonly studied phenomenon, but vertical disparity is possible if the eyes are verged.) In computer vision, disparity is often treated as synonymous with inverse depth [20, 85]. More recently, several researchers have defined disparity as a three-dimensional projective transformation (collineation or homography) of 3-D space (X, Y, Z). The enumeration of all possible matches in such a generalized disparity space can be easily achieved with a plane sweep algorithm [30, 113], which for every disparity d projects all images onto a common plane using 2 a perspective projection (homography). (Note that this is different from the meaning of plane sweep in computational geometry.) In general, we favor the more generalized interpretation of disparity, since it allows the ad... |
234 | A maximum likelihood stereo algorithm.
- Cox, Hingorani, et al.
- 1996
(Show Context)
Citation Context ... methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approa... |
230 |
Probabilistic Solution of Ill-Posed Problems in Computational Vision,
- Marrquin, Mitter, et al.
- 1987
(Show Context)
Citation Context ...ost that is based on a support region, e.g. normalized cross-correlation [51, 19] and the rank transform [129]. (This can also be viewed as a preprocessing step; see Section 3.1.) On the other hand, global algorithms make explicit smoothness assumptions and then solve an optimization problem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs inc... |
225 | Variational principles, surface evolution, pde’s, level set methods and the stereo problem
- Faugeras, Keriven
- 1998
(Show Context)
Citation Context ...nderstanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes [72]. (Horizontal disparity is the most commonly studied phenomenon, but vertical disparity is possible if the eyes are verged.) In computer vision, disparity is often treat... |
221 | A space-sweep approach to true multiimage matching
- Collins
- 1996
(Show Context)
Citation Context ...ual algorithm components design decisions; 2. To provide a test bed for the quantitative evaluation of stereo algorithms. Towards this end, we are placing sample implementations of correspondence algorithms along with test data and results on the Web at www.middlebury.edu/stereo. We emphasize calibrated two-frame methods in order to focus our analysis on the essential components of stereo correspondence. However, it would be relatively straightforward to generalize our approach to include many multi-frame methods, in particular multiple-baseline stereo [85] and its plane-sweep generalizations [30, 113]. The requirement of dense output is motivated by modern applications of stereo such as view synthesis and imagebased rendering, which require disparity estimates in all image regions, even those that are occluded or without texture. Thus, sparse and feature-based stereo methods are outside the scope of this paper, unless they are followed by a surfacefitting step, e.g., using triangulation, splines, or seed-andgrow methods. We begin this paper with a review of the goals and scope of this study, which include the need for a coherent taxonomy and a well thought-out evaluation methodology. We al... |
214 | Probability distributions of optical flow’.
- Simoncelli, Adelson, et al.
- 1991
(Show Context)
Citation Context ...97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) gro... |
207 | A pixel dissimilarity measure that is insensitive to image sampling,”
- Birchfield, Tomasi
- 1998
(Show Context)
Citation Context ... methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approa... |
204 | Bayesian Modeling of Uncertainty in Low Level Vision.
- Szeliski
- 1989
(Show Context)
Citation Context ...+ ρ(d(x, y) − d(x, y+1)), (5) where ρ is some monotonically increasing function of disparity difference. (An alternative to smoothness functionals is to use a lower-dimensional representation such as splines [112].) In regularization-based vision [87], ρ is a quadratic function, which makes d smooth everywhere and may lead to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving and are based on robust ρ functions [119, 16, 97]. Geman and Geman’s seminal paper [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a robust regularization framework. The terms in Esmooth can also be made to depend on the intensity differences, e.g., ρd(d(x, y) − d(x+1, y)) · ρI(‖I(x, y) − I(x+1, y)‖), (6) where ρI is some monotonically decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, 42, 18, 23] encourages disparity discontinuities to coincide with intens... |
191 | A framework for the robust estimation of optical flow.
- BLACK, ANANDAN
- 1993
(Show Context)
Citation Context ...lts from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) ground-truth disparities; (c–e) three (x, y) slices for d = 10, 16, 21; (e) an (x, d) slice for y = 151 (the dashed line in Figure (b)). Different dark (matching) regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in ... |
190 | Depth Discontinuities by Pixel-to-Pixel Stereo,”
- Birchfield, Tomasi
- 1999
(Show Context)
Citation Context ...y matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily incorporate information from more than two images by simply summing up the cost values for each matching image m, since the DSI is associated with a fixed reference image r (Equation (1)).... |
180 |
PMF : a stereo correspondence algorithm using a disparity gradient constraint.
- Pollard, Mayhew, et al.
- 1985
(Show Context)
Citation Context ...n can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation is iterative diffusion, i.e., an aggregation (or averaging) operation that is implemented by repeatedly adding to each pixel’s cost the weighted values of its neighboring pixels’ costs [114, 103, 97]. 3.3... |
157 | Hierarchical spline-based image registration.
- Szeliski, Coughlan
- 1994
(Show Context)
Citation Context ...) = ∑ (x,y) C(x, y, d(x, y)), (4) where C is the (initial or aggregated) matching cost DSI. The smoothness term Esmooth(d) encodes the smoothness assumptions made by the algorithm. To make the optimization computationally tractable, the smoothness term is often restricted to only measuring the differences between neighboring pixels’ disparities, Esmooth(d) = ∑ (x,y) ρ(d(x, y) − d(x+1, y)) + ρ(d(x, y) − d(x, y+1)), (5) where ρ is some monotonically increasing function of disparity difference. (An alternative to smoothness functionals is to use a lower-dimensional representation such as splines [112].) In regularization-based vision [87], ρ is a quadratic function, which makes d smooth everywhere and may lead to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving and are based on robust ρ functions [119, 16, 97]. Geman and Geman’s seminal paper [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a rob... |
147 |
Depth from edge and intensity based stereo
- Baker, Binford
(Show Context)
Citation Context ...ds have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several... |
144 | Occlusions and binocular stereo,” in
- Geiger, Ladendorf, et al.
- 1992
(Show Context)
Citation Context ... methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approa... |
140 | Handling occlusions in dense multi-view stereo.
- Kang, Szeliski, et al.
- 2001
(Show Context)
Citation Context ...urations using a plane sweep algorithm [30, 113]. 3.2. Aggregation of cost Local and window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation... |
138 | Multiway cut for stereo and motion with slanted surfaces.
- Birchfield, Tomasi
- 1999
(Show Context)
Citation Context ...hical refinement framework [90, 11, 8, 112]. A univalued representation of the disparity map is also not essential. Multi-valued representations, which can represent several depth values along each line of sight, have been extensively studied recently, especially for large multiview data set. Many of these techniques use a voxel-based representation to encode the reconstructed colors and spatial occupancies or opacities [113, 101, 67, 34, 33, 24]. Another way to represent a scene with more complexity is to use multiple layers, each of which can be represented by a plane plus residual parallax [5, 14, 117]. Finally, deformable surfaces of various kinds have also been used to perform 3D shape reconstruction from multiple images [120, 121, 43, 38]. 3.6. Summary of methods Table 1 gives a summary of some representative stereo matching algorithms and their corresponding taxonomy, i.e., the matching cost, aggregation, and optimization techniques used by each. The methods are grouped to contrast different matching costs (top), aggregation methods (middle), and optimization techniques (third section), while the last section lists some papers outside the framework. As can be seen from this table, quite... |
138 |
A parallel stereo algorithm that produces dense depth maps and preserves image features.
- Fua
- 1993
(Show Context)
Citation Context ... [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a robust regularization framework. The terms in Esmooth can also be made to depend on the intensity differences, e.g., ρd(d(x, y) − d(x+1, y)) · ρI(‖I(x, y) − I(x+1, y)‖), (6) where ρI is some monotonically decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, 42, 18, 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, ... |
136 | A computational framework for determining stereo correspondence from a set of linear spatial filters,”
- Jones, Malik
- 1992
(Show Context)
Citation Context ..., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily incorporate information from more than two images by simply summing up the cost val... |
131 | Object-Centred Surface Reconstruction: Combining Multi-Image Stereo and Shading,”
- Fua, Leclerc
- 1995
(Show Context)
Citation Context ...ation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes [72]. (Horizontal disparity is the most commonly studied phenomenon, but vertical disparity is possible if the eyes are verged.) In computer visio... |
126 | A stereo machine for video-rate dense depth mapping and its new applications,”
- Kanade, Yoshida, et al.
- 1996
(Show Context)
Citation Context ...either interval. We apply this criterion separately to each color channel, which is not physically plausible (the sub-pixel shift must be consistent across channels), but is easier to implement. 4.2. Aggregation The aggregation section of our test bed implements some commonly used aggregation methods (aggr fn): • Box filter: use a separable moving average filter (add one right/bottom value, subtract one left/top). This implementation trick makes such window-based aggregation insensitive to window size in terms of computation time and accounts for the fast performance seen in realtime matchers [59, 64]. Figure 3: Shiftable window. The effect of trying all 3 × 3 shifted windows around the black pixel is the same as taking the minimum matching score across all centered (non-shifted) windows in the same neighborhood. (Only 3 of the neighboring shifted windows are shown here for clarity.) • Binomial filter: use a separable FIR (finite impulse response) filter. We use the coefficients 1/16{1, 4, 6, 4, 1}, the same ones used in Burt and Adelson’s [26] Laplacian pyramid. Other convolution kernels could also be added later, as could recursive (bi-directional) IIR filtering, which is a very efficien... |
126 |
Computation and analysis of image motion: A synopsis of current problems and methods.
- Mitiche, Bouthemy
- 1996
(Show Context)
Citation Context ...imply enumerating different approaches is unlikely to yield new insights. Clearly, a comparative evaluation is necessary to assess the performance of both established and new algorithms and to gauge the progress of the field. The publication of a similar study by Barron et al. [8] has had a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performance of several popular algorithms, but did not provi... |
126 | Stereo matching with nonlinear diffusion.
- Scharstein, Szeliski
- 1998
(Show Context)
Citation Context ...uch as lowest cost and best (piecewise) smoothness [127]. Figure 1 shows examples of slices through a typical DSI. More figures of this kind can be found in [18]. 3. A taxonomy of stereo algorithms In order to support an informed comparison of stereo matching algorithms, we develop in this section a taxonomy and categorization scheme for such algorithms. We present a set of algorithmic “building blocks” from which a large set of existing algorithms can easily be constructed. Our taxonomy is based on the observation that stereo algorithms generally perform (subsets of) the following four steps [97, 96]: 1. matching cost computation; 2. cost (support) aggregation; 3. disparity computation / optimization; and 4. disparity refinement. The actual sequence of steps taken depends on the specific algorithm. For example, local (window-based) algorithms, where the disparity computation at a given point depends only on intensity values within a finite window, usually make implicit smoothness assumptions by aggregating support. Some of these algorithms can cleanly be broken down into steps 1, 2, 3. For example, the traditional sum-of-squared-differences (SSD) algorithm can be described as: 1. the matc... |
122 |
Fast algorithms for low-level vision,”
- Deriche
- 1990
(Show Context)
Citation Context ...ffect of trying all 3 × 3 shifted windows around the black pixel is the same as taking the minimum matching score across all centered (non-shifted) windows in the same neighborhood. (Only 3 of the neighboring shifted windows are shown here for clarity.) • Binomial filter: use a separable FIR (finite impulse response) filter. We use the coefficients 1/16{1, 4, 6, 4, 1}, the same ones used in Burt and Adelson’s [26] Laplacian pyramid. Other convolution kernels could also be added later, as could recursive (bi-directional) IIR filtering, which is a very efficient way to obtain large window sizes [35]. The width of the box or convolution kernel is controlled by aggr window size. To simulate the effect of shiftable windows [2, 18, 117], we can follow this aggregation step with a separable square min-filter. The width of this filter is controlled by the parameter aggr minfilter. The cascaded effect of a box-filter and an equal-sized min-filter is the same as evaluating a complete set of shifted windows, since the value of a shifted window is the same as that of a centered window at some neighboring pixel (Figure 3). This step adds very little additional computation, since a moving 1-D min-fi... |
121 |
A Bayesian approach to binocular stereopsis
- Belhumeur
- 1996
(Show Context)
Citation Context ... methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approa... |
120 | Phase-based disparity measurement.
- Fleet, Jepson, et al.
- 1991
(Show Context)
Citation Context ...f the possible algorithm design space has been explored over the years, albeit not very systematically. 4. Implementation We have developed a stand-alone, portable C++ implementation of several stereo algorithms. The implementation is closely tied to the taxonomy presented in Section 3 and currently includes window-based algorithms, diffusion algo6 Method Matching cost Aggregation Optimization SSD (traditional) squared difference square window WTA Hannah [51] cross-correlation (square window) WTA Nishihara [82] binarized filters square window WTA Kass [63] filter banks -none- WTA Fleet et al. [40] phase -none- phase-matching Jones and Malik [57] filter banks -none- WTA Kanade [58] absolute difference square window WTA Scharstein [95] gradient-based Gaussian WTA Zabih and Woodfill [129] rank transform (square window) WTA Cox et al. [32] histogram eq. -none- DP Frohlinghaus and Buhmann [41] wavelet phase -none- phase-matching Birchfield and Tomasi [12] shifted abs. diff -none- DP Marr and Poggio [73] binary images iterative aggregation WTA Prazdny [89] binary images 3D aggregation WTA Szeliski and Hinton [114] binary images iterative 3D aggregation WTA Okutomi and Kanade [84] squared dif... |
116 | A layered approach to stereo reconstruction
- Baker, Szeliski, et al.
- 1998
(Show Context)
Citation Context ...ages as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes [72]. (Horizontal dispar... |
116 |
The theory and practice of Bayesian image labeling.
- Chou, Brown
- 1990
(Show Context)
Citation Context ...(6) where ρI is some monotonically decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, 42, 18, 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic... |
112 | Computational experiments with a feature based stereo algorithm.
- Grimson
- 1985
(Show Context)
Citation Context ... regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in the various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching co... |
106 |
PRISM: A practical real-time imaging stereo matcher,”
- Nishihara
- 1984
(Show Context)
Citation Context ...g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in the various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sa... |
103 | Occlusions, discontinuities, and epipolar lines in stereo,”
- Ishikawa, Geiger
- 1998
(Show Context)
Citation Context ..., 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization pr... |
97 | Stereo matching with transparency and matting.
- Szeliski, Golland
- 1999
(Show Context)
Citation Context ...ual algorithm components design decisions; 2. To provide a test bed for the quantitative evaluation of stereo algorithms. Towards this end, we are placing sample implementations of correspondence algorithms along with test data and results on the Web at www.middlebury.edu/stereo. We emphasize calibrated two-frame methods in order to focus our analysis on the essential components of stereo correspondence. However, it would be relatively straightforward to generalize our approach to include many multi-frame methods, in particular multiple-baseline stereo [85] and its plane-sweep generalizations [30, 113]. The requirement of dense output is motivated by modern applications of stereo such as view synthesis and imagebased rendering, which require disparity estimates in all image regions, even those that are occluded or without texture. Thus, sparse and feature-based stereo methods are outside the scope of this paper, unless they are followed by a surfacefitting step, e.g., using triangulation, splines, or seed-andgrow methods. We begin this paper with a review of the goals and scope of this study, which include the need for a coherent taxonomy and a well thought-out evaluation methodology. We al... |
94 | Computing rectifying homographies for stereo vision,
- Loop, Zhang
- 1999
(Show Context)
Citation Context ...moothness assumptions (often implicit) without which the correspondence problem would be underconstrained and illposed. Our taxonomy of stereo algorithms, presented in Section 3, examines both matching assumptions and smoothness assumptions in order to categorize existing stereo methods. Finally, most algorithms make assumptions about camera calibration and epipolar geometry. This is arguably the bestunderstood part of stereo vision; we therefore assume in this paper that we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], tria... |
93 | Structure from stereo-a review, - Dhond, Aggarwal - 1989 |
89 | A probabilistic framework for space carving.
- Broadhurst, Drummond, et al.
- 2001
(Show Context)
Citation Context ...at we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes... |
88 | Generalized voxel coloring.
- Culbertson, Malzbender, et al.
- 1999
(Show Context)
Citation Context ...at we are given a pair of rectified images as input. Recent references on stereo camera calibration and rectification include [130, 70, 131, 52, 39]. 2.2. Representation A critical issue in understanding an algorithm is the representation used internally and output externally by the algorithm. Most stereo correspondence methods compute a univalued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a “cyclopian” view in between some of the images. Other approaches, in particular multi-view stereo methods, use multi-valued [113], voxel-based [101, 67, 34, 33, 24], or layer-based [125, 5] representations. Still other approaches use full 3D models such as deformable models [120, 121], triangulated meshes [43], or level-set methods [38]. Since our goal is to compare a large number of methods within one common framework, we have chosen to focus on techniques that produce a univalued disparity map d(x, y) as their output. Central to such methods is the concept of a disparity space (x, y, d). The term disparity was first introduced in the human vision literature to describe the difference in location of corresponding features seen by the left and right eyes... |
87 |
A locally adaptive window for signal matching.
- Okutomi, Kanade
- 1992
(Show Context)
Citation Context ...urations using a plane sweep algorithm [30, 113]. 3.2. Aggregation of cost Local and window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation... |
84 |
Stochastic Stereo Matching over Scale”,
- Barnard
- 1985
(Show Context)
Citation Context ...ost that is based on a support region, e.g. normalized cross-correlation [51, 19] and the rank transform [129]. (This can also be viewed as a preprocessing step; see Section 3.1.) On the other hand, global algorithms make explicit smoothness assumptions and then solve an optimization problem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs inc... |
83 | A ba yesian treatment of the stereo correspondence problem using half-occluded regions
- Belhumeur, Mumford
(Show Context)
Citation Context ... methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for stereo vision in sparse, edge-based methods [3, 83]. More recent approaches have focused on the dense (intensity-based) scanline optimization problem [10, 9, 46, 31, 18, 13]. These approaches work by computing the minimum-cost path through the matrix of all pairwise matching costs between two corresponding scanlines. Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image. Figure 2 shows one such example. Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the difficulty of enforcing inter-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approa... |
82 | Stereo reconstruction from multiperspective panoramas.
- Shum, Szeliski
- 1999
(Show Context)
Citation Context ... can be easily achieved with a plane sweep algorithm [30, 113], which for every disparity d projects all images onto a common plane using 2 a perspective projection (homography). (Note that this is different from the meaning of plane sweep in computational geometry.) In general, we favor the more generalized interpretation of disparity, since it allows the adaptation of the search space to the geometry of the input cameras [113, 94]; we plan to use it in future extensions of this work to multiple images. (Note that plane sweeps can also be generalized to other sweep surfaces such as cylinders [106].) In this study, however, since all our images are taken on a linear path with the optical axis perpendicular to the camera displacement, the classical inverse-depth interpretation will suffice [85]. The (x, y) coordinates of the disparity space are taken to be coincident with the pixel coordinates of a reference image chosen from our input data set. The correspondence between a pixel (x, y) in reference image r and a pixel (x′, y′) in matching image m is then given by x′ = x + s d(x, y), y′ = y, (1) where s = ±1 is a sign chosen so that disparities are always positive. Note that since our im... |
80 |
Detection of binocular disparities.
- Prazdny
- 1985
(Show Context)
Citation Context ...d disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation is iterative diffusion, i.e., an aggregation (or averaging) operation that is implemented by repeatedly adding to each pixel’s cost the weighted values of its neighboring pixels’ costs [114, 103, 97]. 3.3. Disparity computation and optimization... |
73 |
Optical flow estimation: Advances and comparisons.
- Otte, Nagel
- 1994
(Show Context)
Citation Context ...s, besides being an obvious catch-all reference. Simply enumerating different approaches is unlikely to yield new insights. Clearly, a comparative evaluation is necessary to assess the performance of both established and new algorithms and to gauge the progress of the field. The publication of a similar study by Barron et al. [8] has had a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performanc... |
72 | Approximate N-view stereo.
- Kutulakos
- 2000
(Show Context)
Citation Context ...d and rectified. In computing the prediction error, we need to decide how to treat gaps. Currently, we ignore pixels flagged as gaps in computing the statistics and report the percentage of such missing pixels. We can also optionally compensate for small misregistrations [111]. To do this, we convert each pixel in the original and predicted image to an interval, by blending the pixel’s value with some fraction eval partial shuffle of its neighboring pixels min and max values. This idea is a generalization of the sampling-insensitive dissimilarity measure [12] and the shuffle transformation of [66]. The reported difference is then the (signed) distance between the two computed intervals. We plan to investigate these and other sampling-insensitive matching costs in the future [115]. 5.2. Test data To quantitatively evaluate our correspondence algorithms, we require data sets that either have a ground truth disparity map, or a set of additional views that can be used for prediction error test (or preferably both). We have begun to collect such a database of images, building upon the methodology introduced in [116]. Each image sequence consists of 9 images, taken at regular intervals with ... |
69 | Development of a video-rate stereo machine.
- Kanade
- 1994
(Show Context)
Citation Context ...es are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during aggregation. 3 (a) (b) (c) (d) (e) (f) Figure 1: Slices through a typical disparity space image (DSI): (a) original color image; (b) ground-truth disparities; (c–e) three (x, y) sli... |
67 | A multiple baseline stereo, - Kanade - 1993 |
67 |
Hierarchical warp stereo.
- Quam
- 1984
(Show Context)
Citation Context ...tween these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms [73, 97, 132]. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels [126, 90, 11]. 3.1. Matching cost computation The most common pixel-based matching costs include squared intensity differences (SD) [51, 1, 77, 107] and absolute intensity differences (AD) [58]. In the video processing community, these matching criteria are referred to as the mean-squared error (MSE) and mean absolute difference (MAD) measures; the term displaced frame difference is also often used [118]. More recently, robust measures, including truncated quadratics and contaminated Gaussians have been proposed [15, 16, 97]. These measures are useful because they limit the influence of mismatches during a... |
63 |
3-D surface description from binocular stereo.
- Cochran, Medioni
- 1992
(Show Context)
Citation Context ...thly, and the regions over which these estimates are computed must be on the same (correct) surface. Recently, some questions have been raised about the advisability of fitting correlation curves to integer-sampled matching costs [105]. This situation may even be worse when sampling-insensitive dissimilarity measures are used [12]. We investigate this issue in Section 6.4 below. Besides sub-pixel computations, there are of course other ways of post-processing the computed disparities. Occluded areas can be detected using cross-checking (comparing leftto-right and right-to-left disparity maps) [29, 42]. A median filter can be applied to “clean up” spurious mismatches, and holes due to occlusion can be filled by surface fitting or by distributing neighboring disparity estimates [13, 96]. In our implementation we are not performing such clean-up steps since we want to measure the performance of the raw algorithm components. 3.5. Other methods Not all dense two-frame stereo correspondence algorithms can be described in terms of our basic taxonomy and representations. Here we briefly mention some additional algorithms and representations that are not covered by our framework. The algorithms des... |
58 | A variable window approach to early vision.
- Boykov, Veksler, et al.
- 1998
(Show Context)
Citation Context ...d window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation is iterative diffusion, i.e., an aggregation (or averaging) operation... |
55 |
The JISCT stereo evaluation,”
- Bolles, Baker, et al.
- 1993
(Show Context)
Citation Context ...by Barron et al. [8] has had a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performance of several popular algorithms, but did not provide a detailed taxonomy or as complete a coverage of algorithms. A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and Multi-Baseline Vision [99]. An evaluation of competing algorithms has limited value if each method is treated as a “... |
53 | Shape reconstruction in projective grid space from large number of images.
- Saito, Kanade
- 1999
(Show Context)
Citation Context ...hree-dimensional projective transformation (collineation or homography) of 3-D space (X, Y, Z). The enumeration of all possible matches in such a generalized disparity space can be easily achieved with a plane sweep algorithm [30, 113], which for every disparity d projects all images onto a common plane using 2 a perspective projection (homography). (Note that this is different from the meaning of plane sweep in computational geometry.) In general, we favor the more generalized interpretation of disparity, since it allows the adaptation of the search space to the geometry of the input cameras [113, 94]; we plan to use it in future extensions of this work to multiple images. (Note that plane sweeps can also be generalized to other sweep surfaces such as cylinders [106].) In this study, however, since all our images are taken on a linear path with the optical axis perpendicular to the camera displacement, the classical inverse-depth interpretation will suffice [85]. The (x, y) coordinates of the disparity space are taken to be coincident with the pixel coordinates of a reference image chosen from our input data set. The correspondence between a pixel (x, y) in reference image r and a pixel (x... |
50 | A multibaseline stereo system with active illumination and real-time image acquisition.
- Kang
- 1995
(Show Context)
Citation Context ...unts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily incorporate information from more than two images by simply summing up the cost values for each matching image m, since the DSI is associated with a fixed reference image r (Equation (1)). This is the idea behind multiple-baseline SSSD and SSAD methods [85, 62, 81]. As mentioned in Section 2.2, this idea can be generalized to arbitrary camera configurations using a plane sweep algorithm [30, 113]. 3.2. Aggregation of cost Local and window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows ancho... |
50 |
Prediction error as a quality metric for motion and stereo. In:
- Szeliski
- 1999
(Show Context)
Citation Context ... a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performance of several popular algorithms, but did not provide a detailed taxonomy or as complete a coverage of algorithms. A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and Multi-Baseline Vision [99]. An evaluation of competing algorithms has limited value if each method is treated as a “black box” and only final res... |
48 |
Parallel and deterministic algorithms for MRFs: surface reconstruction
- Geiger, Girosi
- 1991
(Show Context)
Citation Context ...lly decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, 42, 18, 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, 55, 23, 123, 65]. Such methods are more efficient than simulated annealing and have produced good results. Dynamic programming. A different class of global optimization algorithms are those based on dynamic programming. While the 2D-optimization of Equation (3) can be shown to be NP-hard for common classes of smoothness functions [123], dynamic programming can find the global minimum for independent scanlines in polynomial time. Dynamic programming was first used for... |
46 | Dynamic histogram warping of image pairs for constant image brightness,”
- Cox, Roy, et al.
- 1995
(Show Context)
Citation Context ...imilar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily i... |
39 | Occlusion detectable stereo-occlusion patterns in camera matrix - Nakamura, Matsuura, et al. - 1996 |
38 | Techniques for disparity measurement.
- Jenkin, Jepson, et al.
- 1991
(Show Context)
Citation Context ..., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily incorporate information from more than two images by simply summing up the cost val... |
31 |
Computer Matching of Areas in Stereo Images.
- Hannah
- 1974
(Show Context)
Citation Context ...g support. Some of these algorithms can cleanly be broken down into steps 1, 2, 3. For example, the traditional sum-of-squared-differences (SSD) algorithm can be described as: 1. the matching cost is the squared difference of intensity values at a given disparity; 2. aggregation is done by summing matching cost over square windows with constant disparity; 3. disparities are computed by selecting the minimal (winning) aggregated value at each pixel. Some local algorithms, however, combine steps 1 and 2 and use a matching cost that is based on a support region, e.g. normalized cross-correlation [51, 19] and the rank transform [129]. (This can also be viewed as a preprocessing step; see Section 3.1.) On the other hand, global algorithms make explicit smoothness assumptions and then solve an optimization problem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing [75, 6], probabilistic (mean-field) diffusion [97], or graph cuts [23]. In betwee... |
25 | Regularizing phase-based stereo. In:
- Frohlinghaus, Buhmann
- 1996
(Show Context)
Citation Context ...tly includes window-based algorithms, diffusion algo6 Method Matching cost Aggregation Optimization SSD (traditional) squared difference square window WTA Hannah [51] cross-correlation (square window) WTA Nishihara [82] binarized filters square window WTA Kass [63] filter banks -none- WTA Fleet et al. [40] phase -none- phase-matching Jones and Malik [57] filter banks -none- WTA Kanade [58] absolute difference square window WTA Scharstein [95] gradient-based Gaussian WTA Zabih and Woodfill [129] rank transform (square window) WTA Cox et al. [32] histogram eq. -none- DP Frohlinghaus and Buhmann [41] wavelet phase -none- phase-matching Birchfield and Tomasi [12] shifted abs. diff -none- DP Marr and Poggio [73] binary images iterative aggregation WTA Prazdny [89] binary images 3D aggregation WTA Szeliski and Hinton [114] binary images iterative 3D aggregation WTA Okutomi and Kanade [84] squared difference adaptive window WTA Yang et al. [127] cross-correlation non-linear filtering hier. WTA Shah [103] squared difference non-linear diffusion regularization Boykov et al. [22] thresh. abs. diff. connected-component WTA Scharstein and Szeliski [97] robust sq. diff. iterative 3D aggregation mea... |
25 |
Visual integration and detection of discontinuities: the key role of intensity edges.
- Gamble, Poggio
- 1987
(Show Context)
Citation Context ... [47] gave a Bayesian interpretation of these kinds of energy functions [110] and proposed a discontinuity-preserving energy function based on Markov Random Fields (MRFs) and additional line processes. Black and Rangarajan [16] show how line processes can be often be subsumed by a robust regularization framework. The terms in Esmooth can also be made to depend on the intensity differences, e.g., ρd(d(x, y) − d(x+1, y)) · ρI(‖I(x, y) − I(x+1, y)‖), (6) where ρI is some monotonically decreasing function of intensity differences that lowers smoothness costs at high intensity gradients. This idea [44, 42, 18, 23] encourages disparity discontinuities to coincide with intensity/color edges and appears to account for some of the good performance of global optimization approaches. Once the global energy has been defined, a variety of algorithms can be used to find a (local) minimum. Traditional approaches associated with regularization and Markov Random Fields include continuation [17], simulated annealing [47, 75, 6], highest confidence first [28], and mean-field annealing [45]. More recently, max-flow and graph-cut methods have been proposed to solve a special class of global optimization problems [92, ... |
25 |
Performance evaluation of scene registration and stereo matching for cartographic feature extraction,
- Hsieh, McKeown, et al.
- 1992
(Show Context)
Citation Context ...by Barron et al. [8] has had a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performance of several popular algorithms, but did not provide a detailed taxonomy or as complete a coverage of algorithms. A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and Multi-Baseline Vision [99]. An evaluation of competing algorithms has limited value if each method is treated as a “... |
23 |
Brightness-based stereo matching’,
- Gennert
- 1988
(Show Context)
Citation Context ...imilar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily i... |
22 | Matching Images by Comparing their Gradient Fields.
- Scharstein
- 1994
(Show Context)
Citation Context ...various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated... |
21 |
Precise sub-pixel estimation on area-based matching.
- Shimizu, Okutomi
- 2001
(Show Context)
Citation Context ...el disparity estimates can be computed in a variety of ways, including iterative gradient descent and fitting a curve to the matching costs at discrete disparity levels [93, 71, 122, 77, 60]. This provides an easy way to increase the resolution of a stereo algorithm with little additional computation. However, to work well, the intensities being matched must vary smoothly, and the regions over which these estimates are computed must be on the same (correct) surface. Recently, some questions have been raised about the advisability of fitting correlation curves to integer-sampled matching costs [105]. This situation may even be worse when sampling-insensitive dissimilarity measures are used [12]. We investigate this issue in Section 6.4 below. Besides sub-pixel computations, there are of course other ways of post-processing the computed disparities. Occluded areas can be detected using cross-checking (comparing leftto-right and right-to-left disparity maps) [29, 42]. A median filter can be applied to “clean up” spurious mismatches, and holes due to occlusion can be filled by surface fitting or by distributing neighboring disparity estimates [13, 96]. In our implementation we are not perfo... |
19 |
Performance evaluation of stereo for tele-presence.”
- Mulligan, Isler, et al.
- 2001
(Show Context)
Citation Context ... a dramatic effect on the development of optical flow algorithms. Not only is the performance of commonly used algorithms better understood by researchers, but novel publications have to improve in some way on the performance of previously published techniques [86]. A more recent study by Mitiche and Bouthemy [78] reviews a large number of methods for image flow computation and isolates central problems, but does not provide any experimental results. In stereo correspondence, two previous comparative papers have focused on the performance of sparse feature matchers [54, 19]. Two recent papers [111, 80] have developed new criteria for evaluating the performance of dense stereo matchers for image-based rendering and tele-presence applications. Our work is a continuation of the investigations begun by Szeliski and Zabih [116], which compared the performance of several popular algorithms, but did not provide a detailed taxonomy or as complete a coverage of algorithms. A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and Multi-Baseline Vision [99]. An evaluation of competing algorithms has limited value if each method is treated as a “black box” and only final res... |
16 |
Automated stereo perception.
- Arnold
- 1983
(Show Context)
Citation Context ...eneralized to arbitrary camera configurations using a plane sweep algorithm [30, 113]. 3.2. Aggregation of cost Local and window-based methods aggregate the matching cost by summing or averaging over a support region in the DSI C(x, y, d). A support region can be either twodimensional at a fixed disparity (favoring fronto-parallel surfaces), or three-dimensional in x-y-d space (supporting slanted surfaces). Two-dimensional evidence aggregation has been implemented using square windows or Gaussian convolution (traditional), multiple windows anchored at different points, i.e., shiftable windows [2, 18], windows with adaptive sizes [84, 60, 124, 61], and windows based on connected components of constant disparity [22]. Threedimensional support functions that have been proposed include limited disparity difference [50], limited disparity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (... |
15 | A nonlinear diffusion model for discontinuous disparity and half-occlusion in stereo.
- Shah
- 1993
(Show Context)
Citation Context ...arity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation is iterative diffusion, i.e., an aggregation (or averaging) operation that is implemented by repeatedly adding to each pixel’s cost the weighted values of its neighboring pixels’ costs [114, 103, 97]. 3.3. Disparity computation and optimization Local methods. In local methods, the emphasis is on the matching cost computation and on the cost aggregation steps. Computing the final disparities is trivial: simply choose at each pixel the disparity associated with the minimum cost value. Thus, these methods perform a local “winner-takeall” (WTA) optimization at each pixel. A limitation of this approach (and many other correspondence algorithms) is that uniqueness of matches is only enforced for one image (the reference image), while points in the other image might 4 get matched to multiple poi... |
14 | Poxels: Probabilistic voxelized volume reconstruction - DeBonnet, Viola |
14 |
Solving random-dot stereograms using the heat equation.
- Szeliski, Hinton
- 1985
(Show Context)
Citation Context ...arity gradient [88], and Prazdny’s coherence principle [89]. Aggregation with a fixed support region can be performed using 2D or 3D convolution, C(x, y, d) = w(x, y, d) ∗ C0(x, y, d), (2) or, in the case of rectangular windows, using efficient (moving average) box-filters. Shiftable windows can also be implemented efficiently using a separable sliding min-filter (Section 4.2). A different method of aggregation is iterative diffusion, i.e., an aggregation (or averaging) operation that is implemented by repeatedly adding to each pixel’s cost the weighted values of its neighboring pixels’ costs [114, 103, 97]. 3.3. Disparity computation and optimization Local methods. In local methods, the emphasis is on the matching cost computation and on the cost aggregation steps. Computing the final disparities is trivial: simply choose at each pixel the disparity associated with the minimum cost value. Thus, these methods perform a local “winner-takeall” (WTA) optimization at each pixel. A limitation of this approach (and many other correspondence algorithms) is that uniqueness of matches is only enforced for one image (the reference image), while points in the other image might 4 get matched to multiple poi... |
13 |
Using local orientation information as image primitive for robust object recognition.
- Seitz
- 1989
(Show Context)
Citation Context ...various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated... |
12 |
Linear image features in stereopsis.
- Kass
- 1988
(Show Context)
Citation Context ..., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching cost that is insensitive to image sampling [12]. Rather than just comparing pixel values shifted by integral amounts (which may miss a valid match), they compare each pixel in the reference image against a linearly interpolated function of the other image. The matching cost values over all pixels and all disparities form the initial disparity space image C0(x, y, d). While our study is currently restricted to two-frame methods, the initial DSI can easily incorporate information from more than two images by simply summing up the cost val... |
10 |
Edge based stereo correlation.
- Baker
- 1980
(Show Context)
Citation Context ... regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in the various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histogram equalization [48, 32]. Other matching criteria include phase and filter-bank responses [74, 63, 56, 57]. Finally, Birchfield and Tomasi have proposed a matching co... |
10 |
Prediction of correlation errors in stereo-pair images.
- Ryan, Gray, et al.
- 1980
(Show Context)
Citation Context ...; (b) ground-truth disparities; (c–e) three (x, y) slices for d = 10, 16, 21; (e) an (x, d) slice for y = 151 (the dashed line in Figure (b)). Different dark (matching) regions are visible in Figures (c–e), e.g., the bookshelves, table and cans, and head statue, while three different disparity levels can be seen as horizontal lines in the (x, d) slice (Figure (f)). Note the dark bands in the various DSIs, which indicate regions that match at this disparity. (Smaller dark regions are often the result of textureless regions.) Other traditional matching costs include normalized cross-correlation [51, 93, 19], which behaves similar to sumof-squared-differences (SSD), and binary matching costs (i.e., match / no match) [73], based on binary features such as edges [4, 50, 27] or the sign of the Laplacian [82]. Binary matching costs are not commonly used in dense stereo methods, however. Some costs are insensitive to differences in camera gain or bias, for example gradient-based measures [100, 95] and non-parametric measures such as rank and census transforms [129]. Of course, it is also possible to correct for different camera characteristics by performing a preprocessing step for bias-gain or histog... |
7 | View Synthesis Using Stereo Vision. Volume 1583 - Scharstein - 1999 |
6 |
Segmentation processes in visual perception: A cooperative neural model.
- Dev
- 1974
(Show Context)
Citation Context ...-scanline consistency, although several methods propose ways of addressing the latter [83, 9, 31, 18, 13]. Another problem is that the dynamic programming approach requires enforcing the monotonicity or ordering constraint [128]. This constraint requires that the relative ordering of pixels on a scanline remain the same between the two views, which may not be the case in scenes containing narrow foreground objects. Cooperative algorithms. Finally, cooperative algorithms, inspired by computational models of human stereo vision, were among the earliest methods proposed for disparity computation [36, 73, 76, 114]. Such algorithms iteratively perform local computations, but use nonlinear operations that result in an overall behavior similar to global optimization algorithms. In fact, for some of these algorithms, it is possible to explicitly state a global function that is being minimized [97]. Recently, a promising variant of Marr and Poggio’s original cooperative algorithm has been developed [132]. 3.4. Refinement of disparities Most stereo correspondence algorithms compute a set of disparity estimates in some discretized space, e.g., for inte5 c d e f g ka Left scanline i R ig ht s ca nl in e a c f ... |
6 |
A convolver-based real-time stereo machine (SAZAN).
- Kimura
- 1999
(Show Context)
Citation Context ...either interval. We apply this criterion separately to each color channel, which is not physically plausible (the sub-pixel shift must be consistent across channels), but is easier to implement. 4.2. Aggregation The aggregation section of our test bed implements some commonly used aggregation methods (aggr fn): • Box filter: use a separable moving average filter (add one right/bottom value, subtract one left/top). This implementation trick makes such window-based aggregation insensitive to window size in terms of computation time and accounts for the fast performance seen in realtime matchers [59, 64]. Figure 3: Shiftable window. The effect of trying all 3 × 3 shifted windows around the black pixel is the same as taking the minimum matching score across all centered (non-shifted) windows in the same neighborhood. (Only 3 of the neighboring shifted windows are shown here for clarity.) • Binomial filter: use a separable FIR (finite impulse response) filter. We use the coefficients 1/16{1, 4, 6, 4, 1}, the same ones used in Burt and Adelson’s [26] Laplacian pyramid. Other convolution kernels could also be added later, as could recursive (bi-directional) IIR filtering, which is a very efficien... |
2 | Design of Cooperative Networks. Working Paper 253 - Marroquin - 1983 |
1 | Visual Reconstruction - Cambridge, F, et al. - 1987 |