Results 1 - 10
of
46
Putting objects in perspective
- In CVPR
, 2006
"... Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface ..."
Abstract
-
Cited by 106 (10 self)
- Add to MetaCart
Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to refine geometry and vice-versa. Our framework allows painless substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach. 1.
Modeling the World from Internet Photo Collections
- INT J COMPUT VIS
, 2007
"... There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-fro ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like “Notre Dame ” or “Trevi Fountain.” This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world’s well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community.
SIFT Flow: Dense Correspondence across Different Scenes
"... While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its neighbors in a large image collection consisting of a variety of scenes. For a query image, histogram intersection on a bag-of-visual-words representation is used to find the set of nearest neighbors in the database. The SIFT flow algorithm then consists of matching densely sampled SIFT features between the two images, while preserving spatial discontinuities. The use of SIFT features allows robust matching across different scene/object appearances and the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach is able to robustly align complicated scenes with large spatial distortions. We collect a large database of videos and apply the SIFT flow algorithm to two applications: (i) motion field prediction from a single static image and (ii) motion synthesis via transfer of moving objects.
Recovering Occlusion Boundaries from a Single Image
"... Occlusion reasoning, necessary for tasks such as navigation and object search, is an important aspect of everyday life and a fundamental problem in computer vision. We believe that the amazing ability of humans to reason about occlusions from one image is based on an intrinsically 3D interpretation. ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Occlusion reasoning, necessary for tasks such as navigation and object search, is an important aspect of everyday life and a fundamental problem in computer vision. We believe that the amazing ability of humans to reason about occlusions from one image is based on an intrinsically 3D interpretation. In this paper, our goal is to recover the occlusion boundaries and depth ordering of free-standing structures in the scene. Our approach is to learn to identify and label occlusion boundaries using the traditional edge and region cues together with 3D surface and depth cues. Since some of these cues require good spatial support (i.e., a segmentation), we gradually create larger regions and use them to improve inference over the boundaries. Our experiments demonstrate the power of a scene-based approach to occlusion reasoning. 1.
LabelMe video: Building a Video Database with Human Annotations
"... Currently, video analysis algorithms suffer from lack of information regarding the objects present, their interactions, as well as from missing comprehensive annotated video databases for benchmarking. We designed an online and openly accessible video annotation system that allows anyone with a brow ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Currently, video analysis algorithms suffer from lack of information regarding the objects present, their interactions, as well as from missing comprehensive annotated video databases for benchmarking. We designed an online and openly accessible video annotation system that allows anyone with a browser and internet access to efficiently annotate
Creating and Exploring a Large Photorealistic Virtual Space
"... The supplementary video can be viewed at: ..."
Efros. What does the sky tell us about the camera
- In ECCV
, 2008
"... Abstract. As the main observed illuminant outdoors, the sky is a rich source of information about the scene. However, it is yet to be fully explored in computer vision because its appearance depends on the sun position, weather conditions, photometric and geometric parameters of the camera, and the ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Abstract. As the main observed illuminant outdoors, the sky is a rich source of information about the scene. However, it is yet to be fully explored in computer vision because its appearance depends on the sun position, weather conditions, photometric and geometric parameters of the camera, and the location of capture. In this paper, we propose the use of a physically-based sky model to analyze the information available within the visible portion of the sky, observed over time. By fitting this model to an image sequence, we show how to extract camera parameters such as the focal length, and the zenith and azimuth angles. In short, the sky serves as a geometric calibration target. Once the camera parameters are recovered, we show how to use the same model in two applications: 1) segmentation of the sky and cloud layers, and 2) data-driven sky matching across different image sequences based on a novel similarity measure defined on sky parameters. This measure, combined with a rich appearance database, allows us to model a wide range of sky conditions. 1
Building a database of 3D scenes from user annotations
"... Inthispaper, wewishtobuildahighqualitydatabaseof images depicting scenes, along with their real-world threedimensional (3D) coordinates. Such a database is useful for a variety of applications, including training systems for objectdetectionandvalidationof3Doutput. We build such adatabasefromimagesth ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Inthispaper, wewishtobuildahighqualitydatabaseof images depicting scenes, along with their real-world threedimensional (3D) coordinates. Such a database is useful for a variety of applications, including training systems for objectdetectionandvalidationof3Doutput. We build such adatabasefromimagesthathavebeenannotatedwithonly theidentityofobjectsandtheirspatialextentinimages. Importantforthistaskistherecoveryofgeometricinformation thatisimplicitintheobjectlabels,suchasqualtitative relationships between objects (attachment, support, occlusion) and quantitative ones (inferring camera parameters). We describeamodelthatintegratescuesextracted from the object labels to infer the implicit geometric information. We show that we are able to obtain high quality 3D information by evaluating the proposed approach on a database
photo: model-based photograph enhancement and viewing
- COHEN-OR D., DEUSSEN O., UYTTENDAELE M., LISCHINSKI D.: Deep
"... Figure 1: Some of the applications of the Deep Photo system. In this paper, we introduce a novel system for browsing, enhancing, and manipulating casual outdoor photographs by combining them with already existing georeferenced digital terrain and urban models. A simple interactive registration proce ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Figure 1: Some of the applications of the Deep Photo system. In this paper, we introduce a novel system for browsing, enhancing, and manipulating casual outdoor photographs by combining them with already existing georeferenced digital terrain and urban models. A simple interactive registration process is used to align a photograph with such a model. Once the photograph and the model have been registered, an abundance of information, such as depth, texture, and GIS data, becomes immediately available to our system. This information, in turn, enables a variety of operations, ranging from dehazing and relighting the photograph, to novel view synthesis, and overlaying with geographic information. We describe the implementation of a number of these applications and discuss possible extensions. Our results show that augmenting photographs with already available 3D models of the world supports a wide variety of new ways for us to experience and interact with our everyday snapshots.
Sketch2photo: internet image montage
- ACM SIGGRAPH Asia
, 2009
"... Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered onlin ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered online images used during composition are shown in (d). We present a system that composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet. Although online image search generates many inappropriate results, our system is able to automatically select suitable photographs to generate a high quality composition, using a filtering scheme to exclude undesirable images. We also provide a novel image blending algorithm to allow seamless image composition. Each blending result is given a numeric score, allowing us to find an optimal combination of discovered images. Experimental results show the method is very successful; we also evaluate our system using the results from two user studies. 1

