Results 1  10
of
61
Similarity search in high dimensions via hashing
, 1999
"... The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image dat ..."
Abstract

Cited by 641 (10 self)
 Add to MetaCart
The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image databases, document collections, timeseries databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the \curse of dimensionality. &quot; That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in kd trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than bruteforce linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should su ce for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points
Searching in metric spaces
, 2001
"... The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather gen ..."
Abstract

Cited by 436 (38 self)
 Add to MetaCart
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without crossknowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of “intrinsic dimensionality. ” We also present a unified
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 192 (8 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Indexing Large Metric Spaces for Similarity Search Queries
, 1999
"... In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are propos ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
(Show Context)
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the distance computations between objects of the data domain are expensive (such as high dimensional data), and the distance function used is metric. In this paper, we consider using distancebased index structures for similarity queries on large metric spaces. We elaborate on the approach of using reference points (vantage points) to partition the data space into spherical shelllike regions in a hierarchical manner. We introduce the multivantage point tree structure (mvptree) that uses more than one vantage points to partition the space into spherical cuts at each level. In answering similarity based queries, the mvptree also utilizes the precomputed (at con...
TimeParameterized Queries in SpatioTemporal Databases
, 2002
"... Timeparameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the hi ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
Timeparameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the highly dynamic nature of several spatiotemporal applications, TP queries are important both as standalone methods, as well as building blocks of more complex operations. However, little work has been done towards their efficient processing. In this paper, we propose a general framework that covers timeparameterized variations of the most common spatial queries, namely window queries, knearest neighbors and spatial joins. In particular, each of these TP queries is reduced to nearest neighbor search where the distance functions are def'med according to the query type. This reduction allows the application and extension of wellknown branch and bound techniques to the current problem. The proposed methods can be applied with mobile queries, mobile objects or both, given a suitable indexing method. Our experimental evaluation is based on Rtrees and their extensions for dynamic objects.
Trading Quality for Time with NearestNeighbor Search
 in International Conference on Extending Database Technology: Advances in Database Technology
, 2000
"... . In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VAFile for NearestNeighbor Search (NNSearch). The VAFile contains approximations of feature p ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
(Show Context)
. In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VAFile for NearestNeighbor Search (NNSearch). The VAFile contains approximations of feature points. These approximations frequently suffice to eliminate the vast majority of points in a first phase. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate queryevaluation techniques, we proceed in two steps: first, we derive an analytic model for VAFile based NNsearch. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluat...
A Cost Model for Query Processing in HighDimensional Data Spaces
, 2000
"... During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressin ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressing similarity search use the socalled feature approach which transforms important properties of the stored objects into points of a highdimensional space (feature vectors). Thus, the similarity search is transformed into a neighborhood search in the feature space. For the management of the feature vectors, multidimensional index structures are usually applied. The performance of query processing can be substantially improved by opti...
Deflating the dimensionality curse using multiple fractal dimensions
 IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING
"... ..."
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces
 IEEE TKDE
, 2004
"... Existing models for nearest neighbor search in multidimensional spaces are not appropriate for query optimization because they either lead to erroneous estimation, or involve complex equations that are expensive to evaluate in realtime. This paper proposes an alternative method that captures the p ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
(Show Context)
Existing models for nearest neighbor search in multidimensional spaces are not appropriate for query optimization because they either lead to erroneous estimation, or involve complex equations that are expensive to evaluate in realtime. This paper proposes an alternative method that captures the performance of nearest neighbor queries using approximation. For uniform data, our model involves closed formulae that are very efficient to compute and accurate for up to 10 dimensions. Further, the proposed equations can be applied on nonuniform data with the aid of histograms. We demonstrate the effectiveness of the model by using it to solve several optimization problems related to nearest neighbor search. To appear in IEEE TKDE
Searching in Metric Spaces with UserDefined and Approximate Distances
 ACM Transactions on Database Systems
, 2002
"... Metric access methods (MAMs), such as the Mtree, are powerful index structures for supporting similarity queries on metric spaces, which represent a common abstraction forthIj searchrc problems tho arise in many modern application areas, such as multimedia, data mining, decision support, pattern re ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Metric access methods (MAMs), such as the Mtree, are powerful index structures for supporting similarity queries on metric spaces, which represent a common abstraction forthIj searchrc problems tho arise in many modern application areas, such as multimedia, data mining, decision support, pattern recognition, and genomic databases. As compared to multidimensional (spatial) access methods (SAMs), MAMs are more general, yet they are reputed to lose in flexibility, since it is commonly deemed th= th= can only answer queries using th same distance function used to buildth index. In thj paper we sh wth" th" limitation is only apparent  thus MAMs are far more flexible than believed  and extend the Mtree so as to be able to support userdefined distance criteria, approximate distance functions to speed up query evaluation, as well as dissimilarity functions whD h are not metrics. The soextended Mtree, also called QICMtree, can deal with three distinct distances at a time: 1) a query (userdefined) distance,2)anindex distance (used to buildth tree), and 3) a comparison(iso oximate) distance (used to quickly discard from th search uninteresting parts of th tree). We develop an analytical cost model thl accurately characterizes the performance of QICMtree and validate such model thjj"[ extensive experimentation on real metric data sets. In particular, our analysis is able to predict th best evaluation strategy (i.e.whe h distances to use) under a variety of configurations, by properly taking into account relevant factors such as th distribution of distances, th cost of computing distances, and th actual index structure. We also prove thF the overall saving in CPU search costs whj using an approximate distance can be estimated by using information on the data set only  thus...