Results 1 - 10
of
33
Fields of experts: A framework for learning image priors
- In CVPR
, 2005
"... We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhood ..."
Abstract
-
Cited by 153 (3 self)
- Add to MetaCart
We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhoods. Field potentials are modeled using a Products-of-Experts framework that exploits nonlinear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field of Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with and even outperform specialized techniques. 1.
Optimizing binary MRFs via extended roof duality
- In Proc. CVPR
, 2007
"... Many computer vision applications rely on the efficient optimization of challenging, so-called non-submodular, binary pairwise MRFs. A promising graph cut based approach for optimizing such MRFs known as “roof duality” was recently introduced into computer vision. We study two methods which extend t ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Many computer vision applications rely on the efficient optimization of challenging, so-called non-submodular, binary pairwise MRFs. A promising graph cut based approach for optimizing such MRFs known as “roof duality” was recently introduced into computer vision. We study two methods which extend this approach. First, we discuss an efficient implementation of the “probing ” technique introduced recently by Boros et al. [5]. It simplifies the MRF while preserving the global optimum. Our code is 400-700 faster on some graphs than the implementation of [5]. Second, we present a new technique which takes an arbitrary input labeling and tries to improve its energy. We give theoretical characterizations of local minima of this procedure. We applied both techniques to many applications, including image segmentation, new view synthesis, superresolution, diagram recognition, parameter learning, texture restoration, and image deconvolution. For several applications we see that we are able to find the global minimum very efficiently, and considerably outperform the original roof duality approach. In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and simulated annealing, we nearly always find a lower energy. 1.
Space-Time Completion of Video
, 2007
"... This paper presents a new framework for the completion of missing information based on local structures. It poses the task of completion as a global optimization problem with a well-defined objective function and derives a new algorithm to optimize it. Missing values are constrained to form coherent ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
This paper presents a new framework for the completion of missing information based on local structures. It poses the task of completion as a global optimization problem with a well-defined objective function and derives a new algorithm to optimize it. Missing values are constrained to form coherent structures with respect to reference examples. We apply this method to space-time completion of large space-time “holes ” in video sequences of complex dynamic scenes. The missing portions are filled in by sampling spatiotemporal patches from the available parts of the video, while enforcing global spatio-temporal consistency between all patches in and around the hole. The consistent completion of static scene parts simultaneously with dynamic behaviors leads to realistic looking video sequences and images. Space-time video completion is useful for a variety of tasks, including, but not limited to: 1) Sophisticated video removal (of undesired static or dynamic objects) by completing the appropriate static or dynamic background information. 2) Correction of missing/corrupted video frames in old movies. 3) Modifying a visual story by replacing unwanted elements. 4) Creation of video textures by extending smaller ones. 5) Creation of complete field-of-view stabilized video. 6) As images are one-frame videos, we apply the method to this special case as well.
Confocal Stereo
, 2009
"... We present confocal stereo, a new method for computing 3D shape by controlling the focus and aperture of a lens. The method is specifically designed for reconstructing scenes with high geometric complexity or fine-scale texture. To achieve this, we introduce the confocal constancy property, which st ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
We present confocal stereo, a new method for computing 3D shape by controlling the focus and aperture of a lens. The method is specifically designed for reconstructing scenes with high geometric complexity or fine-scale texture. To achieve this, we introduce the confocal constancy property, which states that as the lens aperture varies, the pixel intensity of a visible in-focus scene point will vary in a scene-independent way, that can be predicted by prior radiometric lens calibration. The only requirement is that incoming radiance within the cone subtended by the largest aperture is nearly constant. First, we develop a detailed lens model that factors out the distortions in high resolution SLR cameras (12MP or more) with large-aperture lenses (e.g., f1.2). This allows us to assemble an A Ã F aperture-focus image (AFI) for each pixel, that collects the undistorted measurements over all A apertures and F focus settings. In the AFI representation, confocal constancy reduces to color comparisons within regions of the AFI, and leads to focus metrics that can be evaluated separately for each pixel. We propose two such metrics and present initial reconstruction results for complex scenes, as well as for a scene with known ground-truth shape.
Probabilistic fusion of stereo with color and contrast for bilayer segmentation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... This paper describes models and algorithms for the realtime segmentation of foreground from background layers in stereo video sequences. Automatic separation of layers from color/contrast or from stereo alone is known to be errorprone. Here, color, contrast and stereo matching information are fused ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper describes models and algorithms for the realtime segmentation of foreground from background layers in stereo video sequences. Automatic separation of layers from color/contrast or from stereo alone is known to be errorprone. Here, color, contrast and stereo matching information are fused to infer layers accurately and efficiently. The first algorithm, Layered Dynamic Programming (LDP), solves stereo in an extended 6-state space that represents both foreground/background layers and occluded regions. The stereomatch likelihood is then fused with a contrast-sensitive color model that is learned on the fly, and stereo disparities are obtained by dynamic programming. The second algorithm, Layered Graph Cut (LGC), does not directly solve stereo. Instead the stereo match likelihood is marginalized over disparities to evaluate foreground and background hypotheses, and then fused with a contrast-sensitive color model like the one used in LDP. Segmentation is solved efficiently by ternary graph cut. Both algorithms are evaluated with respect to ground truth data and found to have similar performance, substantially better than either stereo or color/contrast alone. However, their characteristics with respect to computational efficiency are rather different. The algorithms are demonstrated in the application of background substitution and shown to give good quality composite video output. I.
A sampled texture prior for image superresolution
- In NIPS 16
, 2003
"... Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-specific image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a significant improvement over other common multiple-image super-resolution techniques. 1
Real-time pattern matching using projection kernels
- IEEE Trans. Pattern Anal. Mach. Intell
, 2005
"... Abstract—A novel approach to pattern matching is presented in which time complexity is reduced by two orders of magnitude compared to traditional approaches. The suggested approach uses an efficient projection scheme which bounds the distance between a pattern and an image window using very few oper ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract—A novel approach to pattern matching is presented in which time complexity is reduced by two orders of magnitude compared to traditional approaches. The suggested approach uses an efficient projection scheme which bounds the distance between a pattern and an image window using very few operations on average. The projection framework is combined with a rejection scheme which allows rapid rejection of image windows that are distant from the pattern. Experiments show that the approach is effective even under very noisy conditions. The approach described here can also be used in classification schemes where the projection values serve as input features that are informative and fast to extract. Index Terms—Pattern matching, template matching, pattern detection, feature extraction, Walsh-Hadamard. 1
Space-time scene manifolds
- In International Conference on Computer Vision (ICCV’05
, 2005
"... Non-Linear Scene Manifold Space-Time volume The space of images is known to be a non-linear subspace that is difficult to model. This paper derives an algorithm that walks within this space. We seek a manifold through the video volume that is constrained to lie locally in this space. Every local nei ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Non-Linear Scene Manifold Space-Time volume The space of images is known to be a non-linear subspace that is difficult to model. This paper derives an algorithm that walks within this space. We seek a manifold through the video volume that is constrained to lie locally in this space. Every local neighborhood within the manifold resembles some image patch. We call this the Scene Manifold because the solution traces the scene outline. For a broad class of inputs the problem can be posed as finding the shortest path in a graph and can thus be solved efficiently to produce the globally optimal solution. Constraining appearance rather than geometry gives rise to numerous new capabilities. Here we demonstrate the usefulness of this approach by posing the well-studied problem of mosaicing in a new way. Instead of treating it as geometrical alignment, we pose it as an appearance optimization. Since the manifold is constrained to lie in the space of valid image patches, the resulting mosaic is guaranteed to have the least distortions possible. Any small part of it can be seen in some image even though the manifold spans the whole video. Thus it can deal seamlessly with both static and dynamic scenes, with or without 3D parallax. Essentially, the method simultaneously solves two problems that have been solved only separately until now: alignment and mosaicing. 1.
On New View Synthesis Using Multiview Stereo
"... We show that application of modern multiview stereo techniques to the newview synthesis (NVS) problem introduces a number of non-trivial complexities. By simultaneously solving for the colour and depth of the new-view pixels we can eliminate the visual artefacts that conventional NVS-via-stereo suff ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We show that application of modern multiview stereo techniques to the newview synthesis (NVS) problem introduces a number of non-trivial complexities. By simultaneously solving for the colour and depth of the new-view pixels we can eliminate the visual artefacts that conventional NVS-via-stereo suffers. The global occlusion reasoning which has led to considerable improvements in recent stereo algorithms can easily be included in the new algorithm, using a recently improved graph-cut-based optimizer for general multi-label conditional random fields (CRFs). However, the CRF priors that are important to success in stereo cannot be easily applied if the reconstruction is to be computed in the reference frame of the novel view. We address this problem by extending recent work on the fast optimization of texture priors in NVS to model the image edge structure, yielding a synthesis of the two approaches which yields good results on difficult image sequences. 1
Efficient new view synthesis using pairwise dictionary priors
- In Proc. CVPR
, 2007
"... New-view synthesis (NVS) using texture priors (as opposed to surface-smoothness priors) can yield high quality results, but the standard formulation is in terms of largeclique Markov Random Fields (MRFs). Only local optimization methods such as iterated conditional modes, which are prone to fall int ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
New-view synthesis (NVS) using texture priors (as opposed to surface-smoothness priors) can yield high quality results, but the standard formulation is in terms of largeclique Markov Random Fields (MRFs). Only local optimization methods such as iterated conditional modes, which are prone to fall into local minima close to the initial estimate, are practical for solving these problems. In this paper we replace the large-clique energies with pairwise potentials, by restricting the patch dictionary for each clique to image regions suitable for that clique. This enables for the first time the use of a global optimization method, such as tree-reweighted message passing, to solve the NVS problem with image-based priors. We employ a robust, truncated quadratic kernel to reject outliers caused by occlusions, specularities and moving objects, within our global optimization. Because the MRF optimization is thus fast, computing the unary potentials becomes the new performance bottleneck. An additional contribution of this paper is a novel, fast method for enumerating color modes of the per-pixel unary potentials, despite the non-convex nature of our robust kernel. We compare the results of our technique with other rendering methods, and discuss the relative merits and flaws of regularizing color, and of local versus global dictionaries. 1.

