Results 1 - 10
of
31
Keypoint recognition using randomized trees
- IEEE Trans. Pattern Anal. Mach. Intell
"... In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-bas ..."
Abstract
-
Cited by 87 (15 self)
- Add to MetaCart
In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in run-time computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, non-planar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.
Randomized trees for real-time keypoint recognition
- In CVPR
, 2005
"... In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to im ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to implement. It has proved effective but still too slow for real-time use. In this paper, we advocate instead the use of randomized trees as the classification technique. It is both fast enough for real-time performance and more robust. It also gives us a principled way not only to match keypoints but to select during a training phase those that are the most recognizable ones. This results in a real-time system able to detect and position in 3D planar, non-planar, and even deformable objects. It is robust to illuminations changes, scale changes and occlusions. 1.
Surface deformation models for non-rigid 3–d shape recovery. to appear
- IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—Three-dimensional detection and shape recovery of a nonrigid surface from video sequences require deformation models to effectively take advantage of potentially noisy image data. Here, we introduce an approach to creating such models for deformable 3D surfaces. We exploit the fact that the ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Abstract—Three-dimensional detection and shape recovery of a nonrigid surface from video sequences require deformation models to effectively take advantage of potentially noisy image data. Here, we introduce an approach to creating such models for deformable 3D surfaces. We exploit the fact that the shape of an inextensible triangulated mesh can be parameterized in terms of a small subset of the angles between its facets. We use this set of angles to create a representative set of potential shapes, which we feed to a simple dimensionality reduction technique to produce low-dimensional 3D deformation models. We show that these models can be used to accurately model a wide range of deforming 3D surfaces from video sequences acquired under realistic conditions. Index Terms—3D shape recovery, deformation model, nonrigid surfaces. 1
Efficient Visual Search for Objects in Videos
, 2008
"... We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google [9] retrieves web pages containing particular words, by ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google [9] retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a "visual word." Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films "Groundhog Day," "Charade," and "Pretty Woman," including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
Tracking dynamic near-regular textures under occlusion and rapid movements
- In ECCV
, 2006
"... Abstract. We present a dynamic near-regular texture (NRT) tracking algorithm nested in a lattice-based Markov-Random-Field (MRF) model of a 3D spatiotemporal space. One basic observation used in our work is that the lattice structure of a dynamic NRT remains invariant despite its drastic geometry or ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Abstract. We present a dynamic near-regular texture (NRT) tracking algorithm nested in a lattice-based Markov-Random-Field (MRF) model of a 3D spatiotemporal space. One basic observation used in our work is that the lattice structure of a dynamic NRT remains invariant despite its drastic geometry or appearance variations. On the other hand, dynamic NRT imposes special computational challenges to the state of the art tracking algorithms: including highly ambiguous correspondences, occlusions, and drastic illumination and appearance variations. Our tracking algorithm takes advantage of the topological invariant property of the dynamic NRT by combining a global lattice structure that characterizes the topological constraint among multiple textons and an image observation model that handles local geometry and appearance variations. Without any assumptions on the types of motion, camera model or lighting conditions, our tracking algorithm can effectively capture the varying underlying lattice structure of a dynamic NRT in different real world examples, including moving cloth, underwater patterns and marching crowd. 1
A lattice-based mrf model for dynamic near-regular texture tracking
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... c○ ANear-regular texture (NRT) is a geometric and photometric deformation from its regular origin – a congruent wallpaper pattern formed by 2D translations of a single tile. A dynamic NRT is an NRT under motion. Correspondingly, the basic unit of a dynamic NRT is a well-defined texton, as a geometri ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
c○ ANear-regular texture (NRT) is a geometric and photometric deformation from its regular origin – a congruent wallpaper pattern formed by 2D translations of a single tile. A dynamic NRT is an NRT under motion. Correspondingly, the basic unit of a dynamic NRT is a well-defined texton, as a geometrically and photometrically deformed tile, moving through a 3D spatiotemporal space. Although NRTs are pervasive in man-made and natural environments, effective computational algorithms for NRTs are few. Through a systematic and quantitative comparison study of multiple texture synthesis algorithms, we are able to show that faithful NRT synthesis has challenged most of the state of the art texture synthesis algorithms. Our recent work on static NRTs analysis and manipulation [Liu et al., 2004] is the first algorithmic treatment aimed specifically to preserve the regularity and randomness in real-world near regular textures. The theme of this thesis is to address computational issues in modeling, tracking
Combining cues: Shape from shading and texture
- In Proc. Conf. Computer Vision and Pattern Recognition
, 2006
"... We demonstrate a method for reconstructing the shape of a deformed surface from a single view. After decomposing an image into irradiance and albedo components, we combine normal cues from shading and texture to produce a field of unambiguous normals. Using these normals, we reconstruct the 3D geome ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We demonstrate a method for reconstructing the shape of a deformed surface from a single view. After decomposing an image into irradiance and albedo components, we combine normal cues from shading and texture to produce a field of unambiguous normals. Using these normals, we reconstruct the 3D geometry. Our method works in two regimes: either requiring the frontal appearance of the texture or building it automatically from a series of images of the deforming texture. We can recover geometry with errors below four percent of object size on arbitrary textures, and estimate specific geometric parameters using a custom texture even more accurately. Keywords: Shape-from-texture, shape-from-shading, 3D reconstruction, surface tracking, deformable models 1. Overview We demonstrate reconstructions from a single view using a combination of shape and texture cues. We show our reconstructions are geometrically accurate by comparison with reconstructions from multiple views. Traditionally, reconstruction techniques are limited by ambiguities in the local cues — and reconstructions are performed by breaking these ambiguities using global consistency. Instead, we break ambiguities locally and only introduce long scale consistency in the finals steps of the geometric reconstruction. We start with an outline of the reconstruction process. First, we obtain an estimate of the frontal appearance of the texture. This can be supplied manually, or reconstructed automatically from a sequence of images (section 5, figure 5). Second, we decompose the image into an irradiance map and a texture map (section 3.1, figure 4). Third, we obtain ambiguous estimates of the surface normals using the frontal texture, the texture map and the assumption of local orthography [2, 3]. The normals are disambiguated using a shading model applied to the irradiance map (section 3.2). Finally, the surface is reconstructed using perspective effects to break a final concave convex ambiguity (section 4).
Efficient Visual Search of Videos Cast as Text Retrieval
"... Abstract—We describe an approach to object retrieval that searches for and localizes all of the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite ch ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract—We describe an approach to object retrieval that searches for and localizes all of the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination, and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word, which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google [6]. We report results for object retrieval on the full-length feature films “Groundhog Day, ” “Casablanca, ” and “Run Lola Run, ” including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures. Performance is also compared to a baseline method implementing standard frame to frame matching. Index Terms—Object recognition, viewpoint and scale invariance, text retrieval. Ç 1
Progressive finite newton approach to real-time nonrigid surface detection
- In Proc. Conf. Computer Vision and Pattern Recognition
, 2007
"... Detecting nonrigid surfaces is an interesting research problem for computer vision and image analysis. One important challenge of nonrigid surface detection is how to register a nonrigid surface mesh having a large number of free deformation parameters. This is particularly significant for detecting ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Detecting nonrigid surfaces is an interesting research problem for computer vision and image analysis. One important challenge of nonrigid surface detection is how to register a nonrigid surface mesh having a large number of free deformation parameters. This is particularly significant for detecting nonrigid surfaces from noisy observations. Nonrigid surface detection is usually regarded as a robust parameter estimation problem, which is typically solved iteratively from a good initialization in order to avoid local minima. In this paper, we propose a novel progressive finite Newton optimization scheme for the nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key of our approach is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem which has a closed-form solution for a given set of observations. Moreover, we employ a progressive active-set selection scheme, which takes advantage of the rank information of detected correspondences. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is more efficient and effective than the existing iterative methods. 1.
Augmenting deformable objects in realtime
- In International Symposium on Mixed and Augmented Reality
, 2005
"... We present a real-time system that can draw virtual patterns or images on deforming real objects by estimating both the deformations and the shading parameters. We show that this is what is required to render the virtual elements so that they blend convincingly with the surrounding real textures. Th ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We present a real-time system that can draw virtual patterns or images on deforming real objects by estimating both the deformations and the shading parameters. We show that this is what is required to render the virtual elements so that they blend convincingly with the surrounding real textures. The whole process of uncompressing the video stream, measuring the deformations, estimating the lighting parameters, and realistically augmenting the input image takes about 100 ms on a 2.8 GHz PC. It is fully automated and does not require any manual initialization or engineering of the scene. It is also robust to large deformations, lighting changes, motion blur, specularities, and occlusions. It can therefore be demonstrated live on a simple laptop. 1.

