Results 1  10
of
28
The V*Diagram: A QueryDependent Approach to Moving KNN Queries
, 2008
"... The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper presents an incremental saferegionbased technique for answering MkNN queries, called the V*Diagram. In general, a safe region is a set of points where the query point can move without changing the query answer. Traditional saferegion approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the current knowledge of the query point and the search space in addition to the data objects. As a result, the V*Diagram has much smaller IO and computation costs than existing methods. The experimental results show that the V*Diagram outperforms the best existing technique by two orders of magnitude.
Peertopeer similarity search in metric spaces
 IN PROCEEDINGS OF VLDB’07
, 2007
"... This paper addresses the efficient processing of similarity queries in metric spaces, where data is horizontally distributed across a P2P network. The proposed approach does not rely on arbitrary data movement, hence each peer joining the network autonomously stores its own data. We present SIMPEER, ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
This paper addresses the efficient processing of similarity queries in metric spaces, where data is horizontally distributed across a P2P network. The proposed approach does not rely on arbitrary data movement, hence each peer joining the network autonomously stores its own data. We present SIMPEER, a novel framework that dynamically clusters peer data, in order to build distributed routing information at superpeer level. SIMPEER allows the evaluation of range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. SIMPEER utilizes a set of distributed statistics and guarantees that all similar objects to the query are retrieved, without necessarily flooding the network during query processing. The statistics are employed for estimating an adequate query radius for knearest neighbor queries, and transform the query to a range query. Our experimental evaluation employs both realworld and synthetic data collections, and our results show that SIMPEER performs efficiently, even in the case of high degree of distribution.
Similarity search on bregman divergence: Towards nonmetric indexing
 In VLDB
, 2009
"... In this paper, we examine the problem of indexing over nonmetric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KLdivergence and ItakuraSaito distance, a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In this paper, we examine the problem of indexing over nonmetric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KLdivergence and ItakuraSaito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how stateoftheart treebased indexing methods, for low to moderate dimensional datasets, and vector approximation file (VAfile) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distributionbased index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the Rtrees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions. 1.
Generalized multidimensional data mapping and query processing
 ACM Transactions on Database Systems
, 2005
"... Multidimensional data points can be mapped to onedimensional space to exploit single dimensional indexing structures such as the B +tree. In this paper we present a Generalized structure for data Mapping and query Processing (GiMP), which supports extensible mapping methods and query processing. ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Multidimensional data points can be mapped to onedimensional space to exploit single dimensional indexing structures such as the B +tree. In this paper we present a Generalized structure for data Mapping and query Processing (GiMP), which supports extensible mapping methods and query processing. GiMP can be easily customized to behave like many competent indexing mechanisms for multidimensional indexing, such as the UBTree, the Pyramid technique, the iMinMax, and the iDistance. Besides being an extendible indexing structure, GiMP also serves as a framework to study the characteristics of the mapping and hence the efficiency of the indexing scheme. Specifically, we introduce a metric called mapping redundancy to characterize the efficiency of a mapping method in terms of disk page accesses and analyze its behavior for point, range and kNN queries. We also address the fundamental problem of whether an efficient mapping exists and how to define such a mapping for a given data set.
Batch Nearest Neighbor Search for Video Retrieval
"... To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of highdimensional feature vectors. Typically, given a query video containing m feature vectors, an independent Nearest Neighbor (NN) search for each feature vector is often first performe ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of highdimensional feature vectors. Typically, given a query video containing m feature vectors, an independent Nearest Neighbor (NN) search for each feature vector is often first performed. After completing all the NN searches, an overall similarity is then computed, i.e., a single contentbased video retrieval usually involves m individual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk accesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch Nearest Neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards efficient highdimensional BNN search called Dynamic Query Ordering (DQO) for advanced optimizations of both I/O and CPU costs. Observing the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maximally reduce the total number of candidates. Modelling the candidate set relationship of queries by a Candidate Overlapping Graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dynamically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy.
Surface kNN Query Processing
"... A kNN query finds the k nearestneighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient kNN query processing is to fetch and check the distances of a minimum number of points from the database. For many a ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
A kNN query finds the k nearestneighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient kNN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of kNN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of kNN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface kNN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models. 1.
Reverse Furthest Neighbors in Spatial Databases
"... Given a set of points P and a query point q, the reverse furthest neighbor (RFN) query fetches the set of points p ∈ P such that q is their furthest neighbor among all points in P ∪ {q}. This is the monochromatic RFN (MRFN) query. Another interesting version of RFN query is the bichromatic reverse f ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Given a set of points P and a query point q, the reverse furthest neighbor (RFN) query fetches the set of points p ∈ P such that q is their furthest neighbor among all points in P ∪ {q}. This is the monochromatic RFN (MRFN) query. Another interesting version of RFN query is the bichromatic reverse furthest neighbor (BRFN) query. Given a set of points P, a query set Q and a query point q ∈ Q, a BRFN query fetches the set of points p ∈ P such that q is the furthest neighbor of p among all points in Q. The RFN query has many interesting applications in spatial databases and beyond. For instance, given a large residential database (as P) and a set of potential sites (as Q) for building a chemical plant complex, the construction site should be selected as the one that has the maximum number of reverse furthest neighbors. This is an instance of the BRFN query. This paper presents the challenges associated with such queries and proposes efficient, Rtree based algorithms for both monochromatic and bichromatic versions of the RFN queries. We analyze properties of the RFN query that differentiate it from the widely studied reverse nearest neighbor queries and enable the design of novel algorithms. Our approach takes advantage of the furthest Voronoi diagrams as well as the convex hulls of either the data set P (in the MRFN case) or the query set Q (in the BRFN case). For the BRFN queries, we also extend the analysis to the situation when Q is large in size and becomes diskresident. Experiments on both synthetic and real data sets confirm the efficiency and scalability of proposed algorithms over the bruteforce search based approach.
Group Enclosing Queries
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatia ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases [27]. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case √ 2approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimension. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the √ 2approximation algorithm to get a (1 + ǫ)approximate solution for the GEQ problem. Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness and scalability of the proposed algorithms, especially their significant improvement over the stateoftheart method.
Spherical hashing
 In Proc. IEEE Conf
, 2012
"... Many binary code encoding schemes based on hashing have been actively studied recently, since they can provide efficient similarity search, especially nearest neighbor search, and compact data representations suitable for handling large scale image databases in many computer vision problems. Existin ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Many binary code encoding schemes based on hashing have been actively studied recently, since they can provide efficient similarity search, especially nearest neighbor search, and compact data representations suitable for handling large scale image databases in many computer vision problems. Existing hashing techniques encode highdimensional data points by using hyperplanebased hashing functions. In this paper we propose a novel hyperspherebased hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplanebased hashing functions. Furthermore, we propose a new binary code distance function, spherical Hamming distance, that is tailored to our hyperspherebased binary coding scheme, and design an efficient iterative optimization process to achieve balanced partitioning of data points for each hash function and independence between hashing functions. Our extensive experiments show that our spherical hashing technique significantly outperforms six stateoftheart hashing techniques based on hyperplanes across various image benchmarks of sizes ranging from one to 75 million of GIST descriptors. The performance gains are consistent and large, up to 100 % improvements. The excellent results confirm the unique merits of the proposed idea in using hyperspheres to encode proximity regions in highdimensional spaces. Finally, our method is intuitive and easy to implement. 1.
Superseding Nearest Neighbor Search on Uncertain Spatial Databases
"... This paper proposes a new problem, called superseding nearest neighbor search, on uncertain spatial databases, where each object is described by a multidimensional probability density function. Given a query point q, an object is a nearest neighbor (NN) candidate if it has a nonzero probability to ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper proposes a new problem, called superseding nearest neighbor search, on uncertain spatial databases, where each object is described by a multidimensional probability density function. Given a query point q, an object is a nearest neighbor (NN) candidate if it has a nonzero probability to be the NN of q. Given two NN candidates o1 and o2, o1 supersedes o2 if o1 is more likely to be closer to q. An object is a superseding nearest neighbor (SNN) of q, if it supersedes all the other NNcandidates. Sometimes no object is able to supersede every other NN candidate. In this case, we return the SNNcore — the minimum set of NNcandidates each of which supersedes all the NNcandidates outside the SNNcore. Intuitively, the SNNcore contains the best objects, because any object outside the SNNcore is worse than all the objects in the SNNcore. We show that the SNNcore can be efficiently computed by utilizing a conventional multidimensional index, as confirmed by extensive experiments.