## Learning globally-consistent local distance functions for shape-based image retrieval and classification (2007)

### Cached

### Download Links

Venue: | In ICCV |

Citations: | 87 - 2 self |

### BibTeX

@INPROCEEDINGS{Frome07learningglobally-consistent,

author = {Andrea Frome and Fei Sha and Yoram Singer and Jitendra Malik},

title = {Learning globally-consistent local distance functions for shape-based image retrieval and classification},

booktitle = {In ICCV},

year = {2007}

}

### OpenURL

### Abstract

We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feature vectors common in object recognition work as a basis for our image-to-image distance functions. Our large-margin formulation for learning the distance functions is similar to formulations used in the machine learning literature on distance metric learning, however we differ in that we learn local distance functions— a different parameterized function for every image of our training set—whereas typically a single global distance function is learned. This was a novel approach first introduced in Frome, Singer, & Malik, NIPS 2006. In that work we learned the local distance functions independently, and the outputs of these functions could not be compared at test time without the use of additional heuristics or training. Here we introduce a different approach that has the advantage that it learns distance functions that are globally consistent in that they can be directly compared for purposes of retrieval and classification. The output of the learning algorithm are weights assigned to the image features, which is intuitively appealing in the computer vision setting: some features are more salient than others, and which are more salient depends on the category, or image, being considered. We train and test using the Caltech 101 object recognition benchmark. Using fifteen training images per category, we achieved a mean recognition rate of 63.2 % and

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... cannot simultaneously satisfy the constraints for all triplets i,j,k. We therefore need to relax Eq. 2. We propose a relaxation which adapts and generalizes the notion of large-margin classification =-=[21]-=- to our setting. To make the connection to classification work clearer and to simplify our derivation we need to further expand our notation. We denote byWthe vector which is the concatenation of the ... |

1585 | Object recognition from local scale-invariant features
- Lowe
- 1999
(Show Context)
Citation Context ...Dji �= Dij. To approach this problem, we parameterize the image-to-image distance functions using a weighted linear combination of distances between patchbased shape feature descriptors, such as SIFT =-=[14]-=- or geometric blur [2]. These features characterize image patches by fixed length vectors, which can be compared using L1 or L2 metrics. One possible approach to computing an imageto-image distance is... |

947 | Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
- Lazebnik, Schmid, et al.
(Show Context)
Citation Context ...ecognition across categories. Since then, there has been great improvements in recognition performance on the 2004 benchmark, with most algorithms making use of some variant of geometric blur or SIFT =-=[1, 25, 11, 13, 9, 8, 16, 19]-=-. Of this work, [9], [13], and [8] focused specifically on defining good imageto-image kernel functions over sets of patch-based features for use with support vector machines (SVMs). In the first two ... |

504 | S.: Distance metric learning with application to clustering with side-information
- Xing, Ng, et al.
- 2003
(Show Context)
Citation Context ...cross-validation. The objective function and constraints of our largemargin formulation are the same as those in [18], which is part of a larger recent body of work in metric learning, also including =-=[24]-=-, [23], and [7]. In this line of work, the inputs x are points in some metric feature space, and the goal is to learn the matrix A which parameterizes a Mahalanobis distance of the form (x −x ′ )A(x −... |

464 | Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories
- Fei-Fei, Fergus, et al.
(Show Context)
Citation Context ...from j to i (Dji) is smaller than from k to i (Dki). Triplets like this one form the basis of our learning algorithm. 1. Introduction Consider the triplet of images, drawn from the Caltech101 dataset =-=[4]-=-, shown in Figure 1. We want to classify a query image i, and we have stored exemplar images j and k. Let Dji be the distance from image j to i, and Dki be the distance from image k to i where Dji < D... |

407 | Learning to detect natural image boundaries using local brightness, color, and texture cues
- Martin, Fowlkes, et al.
- 1994
(Show Context)
Citation Context ...es per channel Figure 4. Computation of geometric blur features. A set of sparse signals is derived from the image, using filters, oriented energy, or a more sophisticated boundary detector such as Pb=-=[15]-=-. If we have a set of eight signals, then for a given patch in the original image, we have eight signal patches. Each signal patch is blurred geometrically, meaning that the standard deviation of the ... |

326 | Distance Metric Learning for Large Margin Nearest Neighbor Classification
- Weinberger, Blitzer, et al.
- 2006
(Show Context)
Citation Context ...validation. The objective function and constraints of our largemargin formulation are the same as those in [18], which is part of a larger recent body of work in metric learning, also including [24], =-=[23]-=-, and [7]. In this line of work, the inputs x are points in some metric feature space, and the goal is to learn the matrix A which parameterizes a Mahalanobis distance of the form (x −x ′ )A(x −x ′ ).... |

292 | Shape matching and object recognition using low distortion correspondences
- Berg, Malik
- 2005
(Show Context)
Citation Context ...rel image set and they have a thin black border around the image that algorithms can exploit, making it a surprisingly easy category. ture vectors and the geometric arrangement of their patches (e.g. =-=[1]-=-). However, this is expensive, and recent approaches that use sets of features and absolute positions of patches provide good approximations that work well in practice. We work in a setting where we a... |

245 | Caltech-256 object category dataset - Griffin, Holub, et al. - 2007 |

194 | Object recognition with features inspired by visual cortex
- Serre, Wolf, et al.
- 2005
(Show Context)
Citation Context ...ecognition across categories. Since then, there has been great improvements in recognition performance on the 2004 benchmark, with most algorithms making use of some variant of geometric blur or SIFT =-=[1, 25, 11, 13, 9, 8, 16, 19]-=-. Of this work, [9], [13], and [8] focused specifically on defining good imageto-image kernel functions over sets of patch-based features for use with support vector machines (SVMs). In the first two ... |

141 |
Multiclass Object Recognition with Sparse, Localized Features
- Mutch, Lowe
- 2006
(Show Context)
Citation Context ...ecognition across categories. Since then, there has been great improvements in recognition performance on the 2004 benchmark, with most algorithms making use of some variant of geometric blur or SIFT =-=[1, 25, 11, 13, 9, 8, 16, 19]-=-. Of this work, [9], [13], and [8] focused specifically on defining good imageto-image kernel functions over sets of patch-based features for use with support vector machines (SVMs). In the first two ... |

130 | Metric learning by collapsing classes - Globerson, Roweis - 2006 |

123 | Learning a distance metric from relative comparisons
- Schultz, Joachims
- 2004
(Show Context)
Citation Context ... such that the distance relationships among triplets of images holds. We formalize the problem within a large-margin learning framework. At the general algorithmic level, we follow the formulation in =-=[18]-=-, and in the context of image recognition, that in our earlier work, [6]. Both this work and [6] differ from [18] in that we both learn a parameterization (1) for every exemplar (image) and neither ou... |

114 | Geometric blur for template matching
- Berg, Malik
- 2001
(Show Context)
Citation Context ... this problem, we parameterize the image-to-image distance functions using a weighted linear combination of distances between patchbased shape feature descriptors, such as SIFT [14] or geometric blur =-=[2]-=-. These features characterize image patches by fixed length vectors, which can be compared using L1 or L2 metrics. One possible approach to computing an imageto-image distance is to attempt to solve t... |

100 | Algorithmic stability and sanity-check bounds for leave-one-out cross-validation
- Kearns, Ron
- 1999
(Show Context)
Citation Context ...oosing C is to run the full learning procedure with multiple suggestions for C on a held-out portion of the training set, also called a validation set. This approach entertains some formal properties =-=[12]-=- and often yields very good results in practice. However, the approach is quite time consuming as it requires running the training algorithm several times for different partitions of the data. Due to ... |

81 |
Statistical learning theory.,” A Wiley-Interscience Publication
- Vapnik
- 1998
(Show Context)
Citation Context ...annot simultaneously satisfy the constraints for all triplets i, j, k. We therefore need to relax Eq. 2. We propose a relaxation which adapts and generalizes the notion of large-margin classification =-=[21]-=- to our setting. To make the connection to classification work clearer and to simplify our derivation we need to further expand our notation. We denote by W the vector which is the concatenation of th... |

63 |
Online passive aggressive algorithms
- Crammer, Dekel, et al.
- 2003
(Show Context)
Citation Context ...d to the closest value in [0,C]. We stop iterating when all KKT conditions are met, within some precision. This technique is a generalized row-action method, closely related to online learning of SVM =-=[3]-=-, and is described in more detail in [5]. There are some clear alternatives to the machine learning choices we have made in this work and in [6]. In particular, using an L1 regularization that promote... |

60 | Image retrieval and classification using local distance functions
- Frome, Singer, et al.
- 2006
(Show Context)
Citation Context ...formalize the problem within a large-margin learning framework. At the general algorithmic level, we follow the formulation in [18], and in the context of image recognition, that in our earlier work, =-=[6]-=-. Both this work and [6] differ from [18] in that we both learn a parameterization (1) for every exemplar (image) and neither our input distances nor our final distances are metrics. This is a departu... |

54 | Online and batch learning of pseudo-metrics
- Shalev-Shwartz, Singer, et al.
- 2004
(Show Context)
Citation Context ...mpletion in about 11 minutes. Using 15 images per category, it took 10 hours, and with 20 images, approximately 16 hours. An advantage of the dual solver is that, like online learning methods such as =-=[20]-=-, it finds a near-optimal solution very quickly. We record the value of the dual objective after every pass over the data, and we use the rate of change of the dual as an indicator of progress; when t... |

54 | Using dependent regions for object categorization in a generative framework
- Wang, Zhang, et al.
- 2006
(Show Context)
Citation Context ...ategory versus mean recognition rate. Our results are the solid black line that cross above the others between 10 and 15 images per category. Also shows results from [10], [25], [13], [16], [9], [1], =-=[22]-=-, [11], [19], and [4]. Note that the results just below ours at 20 images per category are computed differently; they do not include theFaces easy category in training or testing, thus eliminating a p... |

45 | Combining generative models and fisher kernels for object recognition
- Holub, Welling, et al.
- 2005
(Show Context)
Citation Context |

39 | Approximate correspondences in high dimensions
- Grauman, Darrell
(Show Context)
Citation Context |

31 |
Advances in Kernel Methods: Support Vector Learning, chapter Fast training of SVMs using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ...solver terminates when it can make a full pass over all constraints without any updates. A given constraint may not change because either (1) it has satisfied the KKT conditions within some precision =-=[17]-=-, or (2) the update to the dual variable falls below a threshold for a “useful”update (we use the threshold from [17]). The solver often stops before full convergence, but for large data set sizes it ... |

5 |
Svmknn: Discriminative nearset neighbor classification for visual category recognition
- Zhang, Berg, et al.
- 2006
(Show Context)
Citation Context |

3 | Learning Distance Functions for Exemplar-Based Object Recognition
- Frome
- 2007
(Show Context)
Citation Context ... iterating when all KKT conditions are met, within some precision. This technique is a generalized row-action method, closely related to online learning of SVM [3], and is described in more detail in =-=[5]-=-. There are some clear alternatives to the machine learning choices we have made in this work and in [6]. In particular, using an L1 regularization that promotes sparsity is likely to increase the num... |

3 |
Pyramic match kernels: Discriminative classficiation with sets of image features (version 2
- Grauman, Darrell
- 2006
(Show Context)
Citation Context |