Results 11  20
of
331
Similarity Indexing: Algorithms and Performance
 In Proceedings SPIE Storage and Retrieval for Image and Video Databases
, 1996
"... Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this pr ..."
Abstract

Cited by 121 (1 self)
 Add to MetaCart
(Show Context)
Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized kd tree we call the VAM kd tree, and provide algorithms to create an optimized Rtree we call the VAMSplit Rtree. We found that the VAMSplit Rtree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*tree and SStree in secondary memory applications, and modest improvements relative to optimized kd tree variants.Nearest Neighbor Search
What is the Nearest Neighbor in High Dimensional Spaces?
, 2000
"... Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very difficult one, not only with regards to the performance issue but also to the quality ..."
Abstract

Cited by 120 (9 self)
 Add to MetaCart
Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very difficult one, not only with regards to the performance issue but also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, our new notion of nearest neighbor search does not treat all dimensions equally but uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an efficient and effective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high dimensional data.
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
 In Proceedings of ICDE’99
, 1999
"... Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search ..."
Abstract

Cited by 109 (11 self)
 Add to MetaCart
(Show Context)
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 1015 dimensional spaces. This paper introduces the hybrid tree – a multidimensional data structure for indexing high dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (e.g., Rtree, SStree, SRtree) or a pure space partitioning (SP) one (e.g., KDBtree, hBtree); rather, it “combines ” positive aspects of the two types of index structures a single data structure to achieve search performance more scalable to high dimensionalities than either of the above techniques (hence, the name “hybrid”). Furthermore, unlike many data structures (e.g., distance based index structures like SStree, SRtree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on “real” high dimensional large size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DPbased and SPbased index mechanisms as well as linear scan at all dimensionalities for large sized databases. 1.
Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects
, 2001
"... With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighb ..."
Abstract

Cited by 104 (8 self)
 Add to MetaCart
With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for nonmoving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points. This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.
The Atree: An Index Structure for HighDimensional Spaces Using Relative Approximation
, 2000
"... We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and ..."
Abstract

Cited by 103 (0 self)
 Add to MetaCart
We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and thus affect the tree configuration both quantitatively and qualitatively. Firstly, since tree nodes can install large number of entries of VBRs, fanout of nodes becomes large, thus leads to fast search. More importantly, we have a free hand in arranging MBRs and VBRs in tree nodes. In the Atrees, nodes contain entries of an MBR and its children VBRs. Therefore, by fetching a node of an Atree, we can obtain the information of exact position of a parent MBR and approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the Atree outperforms the SRtree and the VAFile in all range of dimensionality up to 64 dimension, which is the highest dimension in our experiments. The Atree achieves 77.3 % (77.7%, resp.) savings in page accesses compared to the SRtree (the VAFile, resp.) for 64dimensional real data.
Dimensionality Reduction for Similarity Searching in Dynamic Databases
, 1998
"... Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index struc ..."
Abstract

Cited by 103 (5 self)
 Add to MetaCart
(Show Context)
Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index structures degrades. This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data. Such a reduction is however accompanied by a loss of precision of query results. Current techniques such as QBIC use SVD transformbased dimensionality reduction to ensure high query precision. The drawback of this approach is that SVD is expensive to compute, and therefore not readily applicable to dynamic databases. In this paper, we propose novel techniques for performing SVDbased dimensionality reduction in dynamic databases. When the data distribution changes considerably so as to degrade query precision, we recompute the SVD transform a...
Independent Quantization: An Index Compression Technique for HighDimensional Data Spaces
 IN ICDE
, 1999
"... Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficie ..."
Abstract

Cited by 86 (21 self)
 Add to MetaCart
Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficient in extreme configurations, such as highdimensional spaces, where even a simple scan may be cheaper than an indexbased search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a highdimensional space. For this purpose, we develop a compressed index, called the IQtree, with a threelevel structure: The first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data. We overcome several engineering challenges in constructing an effective index structure of this type...
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 83 (37 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Supporting Ranked Boolean Similarity Queries in MARS
, 1998
"... To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) [29]. In this paper, we concentrate on the retrieval subsystem of MARS and its support for contentbased queries over image ..."
Abstract

Cited by 76 (12 self)
 Add to MetaCart
(Show Context)
To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) [29]. In this paper, we concentrate on the retrieval subsystem of MARS and its support for contentbased queries over image databases. Contentbased retrieval techniques have been extensively studied for textual documents in the area of automatic information retrieval [40, 4]. This paper describes how these techniques can be adapted for ranked retrieval over image databases. Specifically, we discuss the ranking and retrieval algorithms developed in MARS based on the Boolean retrieval model and describe the results of our experiments that demonstrate the effectiveness of the developed model for image retrieval.
Indexing the Distance: An Efficient Method to KNN Processing
, 2001
"... In this paper, we present an efficient method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional space. iDistance partitions the data and selects a reference point for each partition. The data in each cluster are transformed into a single dimensional space based on their si ..."
Abstract

Cited by 71 (15 self)
 Add to MetaCart
In this paper, we present an efficient method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional space. iDistance partitions the data and selects a reference point for each partition. The data in each cluster are transformed into a single dimensional space based on their similarity with respect to a reference point. This allows the points to be indexed using a B + tree structure and KNN search be performed using onedimensional range search. The choice of partition and reference point provides the iDistance technique with degrees of freedom most other techniques do not have. We describe how appropriate choices here can effectively adapt the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness.