Results 11 - 20
of
471
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion --- IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC-1C365B, 445 South Street, Morristown, NJ ...
Ranking in Spatial Databases
, 1995
"... An algorithm for ranking spatial objects according to increasing distance from a query object is introduced and analyzed. The algorithm makes use of a hierarchical spatial data structure. The intended application area is a database environment, where the spatial data structure serves as an index. T ..."
Abstract
-
Cited by 164 (21 self)
- Add to MetaCart
An algorithm for ranking spatial objects according to increasing distance from a query object is introduced and analyzed. The algorithm makes use of a hierarchical spatial data structure. The intended application area is a database environment, where the spatial data structure serves as an index. The algorithm is incremental in the sense that objects are reported one by one, so that a query processor can use the algorithm in a pipelined fashion for complex queries involving proximity. It is well suited for k nearest neighbor queries, and has the property that k needs not be fixed in advance.
Two Algorithms for Nearest-Neighbor Search in High Dimensions
, 1997
"... Representing data as points in a high-dimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract
-
Cited by 150 (0 self)
- Add to MetaCart
Representing data as points in a high-dimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearest-neighbor problem for d-dimensional Euclidean space: we wish to pre-process a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as k-d trees reveals an exponential dependence on d in the query time. In this wor...
ANN: A Library for Approximate Nearest Neighbor Searching
, 1997
"... 3.37> ffl There are no exponential factors in space, implying that the data structure is practical even for very large data sets in high dimensional spaces, irrespective of ffl. ANN is written as a testbed for a class of nearest neighbor searching algorithms, particularly those based on orthogonal ..."
Abstract
-
Cited by 140 (9 self)
- Add to MetaCart
3.37> ffl There are no exponential factors in space, implying that the data structure is practical even for very large data sets in high dimensional spaces, irrespective of ffl. ANN is written as a testbed for a class of nearest neighbor searching algorithms, particularly those based on orthogonal decompositions of space. These include k-d trees [3, 4], balanced box-decomposition trees [2] and other related spatial data structures (see Samet [5]). The library supports a number of different methods for building search structures. It also supports two methods for searching these structures: standard tree-ordered search [1] and priority search [2]. In priority search, the cells of the data structure are visited in increasing order of distance from the query point. In addition to the library there are two programs provided for testing and evaluating the performance of various search methods. The first, called ann<F2
Object retrieval with large vocabularies and fast spatial matching
- In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our ..."
Abstract
-
Cited by 139 (14 self)
- Add to MetaCart
In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract
-
Cited by 131 (1 self)
- Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
, 2000
"... K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its ..."
Abstract
-
Cited by 129 (3 self)
- Add to MetaCart
K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real...
Index-driven similarity search in metric spaces
- ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract
-
Cited by 118 (6 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
ImageRover: A Content-Based Image Browser for the World Wide Web
- In Proc. IEEE Workshop on Content-based Access of Image and Video Libraries
, 1997
"... ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the approp ..."
Abstract
-
Cited by 117 (3 self)
- Add to MetaCart
ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query. Keywords: Image databases, query by image content, content-based retrieval, world wide web search engines. 1 Introduction For a while now there have been software "robots" roving the World Wide Web (WWW) collecting index information about th...
Real-time texture synthesis by patch-based sampling
- ACM Transactions on Graphics
, 2001
"... We present a patch-based sampling algorithm for synthesizing textures from an input sample texture. The patch-based sampling algorithm is fast. Using patches of the sample texture as building blocks for texture synthesis, this algorithm makes high-quality texture synthesis a real-time process. For g ..."
Abstract
-
Cited by 105 (9 self)
- Add to MetaCart
We present a patch-based sampling algorithm for synthesizing textures from an input sample texture. The patch-based sampling algorithm is fast. Using patches of the sample texture as building blocks for texture synthesis, this algorithm makes high-quality texture synthesis a real-time process. For generating textures of the same size and comparable (or better) quality, patch-based sampling is orders of magnitude faster than existing texture synthesis algorithms. The patch-based sampling algorithm synthesizes high-quality textures for a wide variety of textures ranging from regular to stochastic. By sampling patches according to a non-parametric estimation of the local conditional MRF density, we avoid mismatching features across patch boundaries. Moreover, the patch-based sampling algorithm remains effective when pixel-based non-parametric sampling algorithms fail to produce good results. For natural textures, the results of the patch-based sampling look subjectively better.

