Results 1 - 10
of
15
Near neighbor search in large metric spaces
- In Proceedings of the 21th International Conference on Very Large Data Bases
, 1995
"... Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more ge ..."
Abstract
-
Cited by 159 (0 self)
- Add to MetaCart
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT- Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT’s outperform previous data structures in a number of applications.
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data
- In Twelfth Conference on Uncertainty in Artificial Intelligence
, 2000
"... This paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms. ..."
Abstract
-
Cited by 65 (9 self)
- Add to MetaCart
This paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms.
Fast Image Search for Learned Metrics
"... We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search unde ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to a variety of image datasets. Our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.
Excluded Middle Vantage Point Forests for Nearest Neighbor Search
- In DIMACS Implementation Challenge, ALENEX'99
, 1999
"... The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our an ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our analysis predicts vp-forest performance in simple settings such as L p spaces with uniform random datasets | and experiments conrm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality in the context of our methods and kd-trees as well. In our idealized setting the dataset is organized into a forest of O(N 1 ) trees, each of depth O(log N ). Here may be viewed as depending on , the distance function, and on the dataset. The radius of interest is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require O(N 1 log N) time, or...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
- IN PROC. 11TH ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R√d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of d-dimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R√d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...
Kernelized locality-sensitive hashing for scalable image search
- IEEE International Conference on Computer Vision (ICCV
, 2009
"... Fast retrieval methods are critical for large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Fast retrieval methods are critical for large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm’s sub-linear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several large-scale datasets, and show that it enables accurate and fast performance for example-based object classification, feature matching, and content-based retrieval. 1.
Learning to hash with binary reconstructive embeddings
- in Proc. NIPS, 2009
"... Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitl ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques. 1
Fast similarity search for learned metrics
, 2007
"... We propose a method to efficiently index into a large database of examples according to a learned metric. Given a collection of examples, we learn a Mahalanobis distance using an information-theoretic metric learning technique that adapts prior knowledge about pairwise distances to incorporate simil ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We propose a method to efficiently index into a large database of examples according to a learned metric. Given a collection of examples, we learn a Mahalanobis distance using an information-theoretic metric learning technique that adapts prior knowledge about pairwise distances to incorporate similarity and dissimilarity constraints. To enable sub-linear time similarity search under the learned metric, we show how to encode a learned Mahalanobis parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for sparse input vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to systems and image datasets, and show that our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction permits efficient indexing with a learned distance and very large databases. 1
Scalable similarity search in metric spaces
- In: Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation, D1.1.1 Final 30/39 Supprimé : : of the 6 th Thematic Workshop of the EU Network of Excellence DELOS, S. Margherita di
, 2004
"... Abstract. Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the si ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT * index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT * achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets. 1
Spatial Lesion Indexing for Medical Image Databases Using Force Histograms
- 2001 (Int. Conf. on Computer Vision and Pattern Recognition), Hawaii, Proceedings
"... It is often diJficult to come up with a well-principled approach to the selection of a spatial indexing mechanism for medical image databases. Spatial information about lesions in medical images is critically important in disease diagnosis and plays an important role in image retrieval. Unfortunatel ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
It is often diJficult to come up with a well-principled approach to the selection of a spatial indexing mechanism for medical image databases. Spatial information about lesions in medical images is critically important in disease diagnosis and plays an important role in image retrieval. Unfortunately, the images are rarely indexed properly for clinically useful retrieval. One example is the well-known R-tree and its variants which index image objects based on their physical locations in an "absolute" way. However, such information is not meaningful in medical content-based image retrieval systems, and the approaches above suffer from problems caused by variations in object size and shape, imprecise image centering, etc. A more appropriate approach, which does not require object registration, is to model the spatial relationships between the lesions and anatomical landmarks. To convey diagnostic information, lesions must exist in certain locations with regard to the landmarks. In this paper, we show that the histogram of forces (which represents the relative position between two objects) provides an eJficient spatial indexing mechanism in the medical domain.

