Shape Matching and Object Recognition Using Shape Contexts
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solv ing for correspondences between points on the two shapes, (2) using the correspondences to estimate an aligning transform ..."
Cited by 1790 (21 self)
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solv ing for correspondences between points on the two shapes, (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape con texts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; reg ularized thin plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning trans form. We treat recognition in a nearestneighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits and the COIL dataset.
Matching words and pictures
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We present a new approach for modeling multimodal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (autoannotation ..."
Cited by 659 (39 self)
We present a new approach for modeling multimodal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (autoannotation) and corresponding to particular image regions (region naming). Autoannotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multimodal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multimodal and correspondence extensions to Hofmann’s hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multimodal extension to mixture of latent Dirichlet allocation
Learning to detect natural image boundaries using local brightness, color, and texture cues
 PAMI
, 2004
"... Abstract—The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from ..."
Cited by 625 (18 self)
Abstract—The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precisionrecall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are 1) that cue combination can be performed adequately with a simple linear model and 2) that a proper, explicit treatment of texture is required to detect boundaries in natural images. Index Terms—Texture, supervised learning, cue combination, natural images, ground truth segmentation data set, boundary detection, boundary localization. 1
Estimating Human Body Configurations using Shape Context Matching
, 2002
"... The problem we consider in this paper is to take a single twodimensional image containing a human body, locate the joint positions, and use these to estimate the body configuration and pose in threedimensional space. The basic approach is to store a number of exemplar 2D views of the human body in ..."
Cited by 186 (12 self)
The problem we consider in this paper is to take a single twodimensional image containing a human body, locate the joint positions, and use these to estimate the body configuration and pose in threedimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labelled for future use. The test shape is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chainbased deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the joint locations, the 3D body configuration and pose are then estimated.
Shape Context: A new descriptor for shape matching and object recognition
 In NIPS
, 2000
"... We introduce a new shape descriptor, the shape context, for correspondence recovery and shapebased object recognition. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape in a rich, local descriptor. Shape contexts gr ..."
Cited by 167 (6 self)
We introduce a new shape descriptor, the shape context, for correspondence recovery and shapebased object recognition. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape in a rich, local descriptor. Shape contexts greatly simplify recovery of correspondences between points of two given shapes. Moreover, the shape context leads to a robust score for measuring shape similarity, once shapes are aligned. The shape context descriptor is tolerant to all common shape deformations. As a key advantage no special landmarks or keypoints are necessary. It is thus a generic method with applications in object recognition, image registration and point set matching. Using examples involving both handwritten digits and 3D objects, we illustrate its power for object recognition.
iSAM: Incremental Smoothing and Mapping
, 2008
"... We present incremental smoothing and mapping (iSAM), a novel approach to the simultaneous localization and mapping problem that is based on fast incremental matrix factorization. iSAM provides an efficient and exact solution by updating a QR factorization of the naturally sparse smoothing informatio ..."
Cited by 153 (35 self)
We present incremental smoothing and mapping (iSAM), a novel approach to the simultaneous localization and mapping problem that is based on fast incremental matrix factorization. iSAM provides an efficient and exact solution by updating a QR factorization of the naturally sparse smoothing information matrix, therefore recalculating only the matrix entries that actually change. iSAM is efficient even for robot trajectories with many loops as it avoids unnecessary fillin in the factor matrix by periodic variable reordering. Also, to enable data association in realtime, we provide efficient algorithms to access the estimation uncertainties of interest based on the factored information matrix. We systematically evaluate the different components of iSAM as well as the overall algorithm using various simulated and realworld datasets for both landmark and poseonly settings.
Auction algorithms for network flow problems: A tutorial introduction
 Comput. Optim. Appl
, 1992
"... by ..."
Solving Large Quadratic Assignment Problems on Computational Grids
, 2000
"... The quadratic assignment problem (QAP) is among the hardest combinatorial optimization problems. Some instances of size n = 30 have remained unsolved for decades. The solution of these problems requires both improvements in mathematical programming algorithms and the utilization of powerful computat ..."
Cited by 82 (6 self)
The quadratic assignment problem (QAP) is among the hardest combinatorial optimization problems. Some instances of size n = 30 have remained unsolved for decades. The solution of these problems requires both improvements in mathematical programming algorithms and the utilization of powerful computational platforms. In this article we describe a novel approach to solve QAPs using a stateoftheart branchandbound algorithm running on a federation of geographically distributed resources known as a computational grid. Solution of QAPs of unprecedented complexity, including the nug30, kra30b, and tho30 instances, is reported.
Learning Graph Matching
"... As a fundamental problem in pattern recognition, graph matching has found a variety of applications in the field of computer vision. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. There are many way ..."
Cited by 81 (9 self)
As a fundamental problem in pattern recognition, graph matching has found a variety of applications in the field of computer vision. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. There are many ways in which the problem has been formulated, but most can be cast in general as a quadratic assignment problem, where a linear term in the objective function encodes node compatibility functions and a quadratic term encodes edge compatibility functions. The main research focus in this theme is about designing efficient algorithms for solving approximately the quadratic assignment problem, since it is NPhard. In this paper, we turn our attention to the complementary problem: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the “labels” are matchings between pairs of graphs. We present experimental results with real image data which give evidence that learning can improve the performance of standard graph matching algorithms. In particular, it turns out that linear assignment with such a learning scheme may improve over stateoftheart quadratic assignment relaxations. This finding suggests that for a range of problems where quadratic assignment was thought to be essential for securing good results, linear assignment, which is far more efficient, could be just sufficient if learning is performed. This enables speedups of graph matching by up to 4 orders of magnitude while retaining stateoftheart accuracy. 1.
Video Tooning
, 2004
"... We describe a system for transforming an input video into a highly abstracted, spatiotemporally coherent cartoon animation with a range of styles. To achieve this, we treat video as a spacetime volume of image data. We have developed an anisotropic kernel mean shift technique to segment the video ..."
Cited by 79 (3 self)
We describe a system for transforming an input video into a highly abstracted, spatiotemporally coherent cartoon animation with a range of styles. To achieve this, we treat video as a spacetime volume of image data. We have developed an anisotropic kernel mean shift technique to segment the video data into contiguous volumes. These provide a simple cartoon style in themselves, but more importantly provide the capability to semiautomatically rotoscope semantically meaningful regions.