Results 1  10
of
53
Locationbased Spatial Queries
 In SIGMOD
, 2003
"... In this paper we propose an approach that enables mobile clients to determine the validity of previous queries based on their current locations. In order to make this possible, the server returns in addition to the query result, a validity region around the client's location within which the re ..."
Abstract

Cited by 108 (12 self)
 Add to MetaCart
In this paper we propose an approach that enables mobile clients to determine the validity of previous queries based on their current locations. In order to make this possible, the server returns in addition to the query result, a validity region around the client's location within which the result remains the same. We focus on two of the most common spatial query types, namely nearest neighbor and window queries, define the validity region in each case and propose the corresponding query processing algorithms. In addition, we provide analytical models for estimating the expected size of the validity region. Our techniques can significantly reduce the number of queries issued to the server, while introducing minimal computational and network overhead compared to traditional spatial queries.
iDistance: An Adaptive B+tree Based Indexing Method for Nearest Neighbor Search
, 2005
"... In this article, we present an efficient B +tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
In this article, we present an efficient B +tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B +tree structure and KNN search to be performed using onedimensional range search. The choice of partition and reference points adapts the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.
Aggregate Nearest Neighbor Queries in Spatial Databases
 TODS
, 2005
"... Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes t ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes the sum of distances pqi  for 1 ≤ i ≤ n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility. If Q fits in memory and P is indexed by an Rtree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results. Then, we analyze their performance and propose cost models for query optimization. Finally, we extend our techniques for diskresident queries and approximate ANN retrieval. The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets.
A thresholdbased algorithm for continuous monitoring of k nearest neighbors
 IEEE TKDE
, 2005
"... Abstract—Assume a set of moving objects and a central server that monitors their positions over time, while processing continuous nearest neighbor queries from geographically distributed clients. In order to always report uptodate results, the server could constantly obtain the most recent positio ..."
Abstract

Cited by 45 (10 self)
 Add to MetaCart
(Show Context)
Abstract—Assume a set of moving objects and a central server that monitors their positions over time, while processing continuous nearest neighbor queries from geographically distributed clients. In order to always report uptodate results, the server could constantly obtain the most recent position of all objects. However, this naïve solution requires the transmission of a large number of rapid data streams corresponding to location updates. Intuitively, current information is necessary only for objects that may influence some query result (i.e., they may be included in the nearest neighbor set of some client). Motivated by this observation, we present a thresholdbased algorithm for the continuous monitoring of nearest neighbors that minimizes the communication overhead between the server and the data objects. The proposed method can be used with multiple, static, or moving queries, for any distance definition, and does not require additional knowledge (e.g., velocity vectors) besides object locations. Index Terms—Spatial databases, locationdependent and sensitive, query processing. 1
Group nearest neighbor queries
 In ICDE
, 2004
"... Given two sets of points P and Q, a group nearest neighbor (GNN) query retrieves the point(s) of P with the smallest sum of distances to all points in Q. Consider, for instance, three users at locations q1, q2 and q3 that want to find a meeting point (e.g., a restaurant); the corresponding query ret ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
Given two sets of points P and Q, a group nearest neighbor (GNN) query retrieves the point(s) of P with the smallest sum of distances to all points in Q. Consider, for instance, three users at locations q1, q2 and q3 that want to find a meeting point (e.g., a restaurant); the corresponding query returns the data point p that minimizes the sum of Euclidean distances pqi  for 1≤i≤3. Assuming that Q fits in memory and P is indexed by an Rtree, we propose several algorithms for finding the group nearest neighbors efficiently. As a second step, we extend our techniques for situations where Q cannot fit in memory, covering both indexed and nonindexed query points. An experimental evaluation identifies the best alternative based on the data and query properties. 1.
On trip planning queries in spatial databases
 In SSTD
, 2005
"... In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
(Show Context)
In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through at least one point from each category, and ends at E. For example, a driver traveling from Boston to Providence might want to stop to a gas station, a bank and a post office on his way, and the goal is to provide him with the best possible route (in terms of distance, traffic, road conditions, etc.). The difficulty of this query lies in the existence of multiple choices per category. In this paper, we study fast approximation algorithms for TPQ in a metric space. We provide a number of approximation algorithms with approximation ratios that depend on either the number of categories, the maximum number of points
Analysis of Minimum Distances in HighDimensional Musical Spaces
"... Language Processing. Do not distribute! We propose an automatic method for measuring music similarity using audio features so we can enhance the current generation of taxonomybased music search engines and recommender systems. Efficiency is important in an Internetconnected world, where users have ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
Language Processing. Do not distribute! We propose an automatic method for measuring music similarity using audio features so we can enhance the current generation of taxonomybased music search engines and recommender systems. Efficiency is important in an Internetconnected world, where users have access to millions of tracks. Bruteforce algorithms for searching through this content are not practical. Many previous approaches to track similarity require pairwise processing between all audio features in a database and therefore are generally not practical for large collections. Our features are timeordered overlapping fixedlength subsequences of equaltemperament pitchclass profiles and logfrequency cepstral coefficients; the technique is analogous to the technique of shingling used for text retrieval. We use locality sensitive hashing to implement approximate matching for our highdimensional audio shingles. This approach retrieves near neighbors within a specified distance of the query rather than retrieving only the nearest neighbors; the degree of approximation, ɛ, is a parameter. LSH achieves sub linear query time performance with respect to the number of tracks in a collection but requires an accurate threshold on retrieval distance for efficient performance. In this paper, we present a new method for estimating the optimal search radius for LSH retrieval tasks by modeling the betweenshingle distance distributions for nonsimilar audio shingles. We derive an estimator for a minimum distance for two shingles to be considered drawn from different tracks. therefore, are considered to be drawn from similar tracks. We evaluate our proposed methods on three contrasting music similarity tasks: retrieval of misattributed recordings (Apocrypha), retrieval of the same work by performed by different artists (Opus) and retrieval of edited and sampled versions of a query track by remix artists (Remixes). Our results achieve nearperfect performance in the first two tasks and 80 % precision at 70 % recall in the third task.
Spatial Queries in Dynamic Environments
, 2003
"... this paper we formulate two novel query types, time parameterized and continuous queries, applicable in such environments. A timeparameterized query retrieves the actual result at the time when the query is issued, the expiry time of the result given the current motion of the query and database obje ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
this paper we formulate two novel query types, time parameterized and continuous queries, applicable in such environments. A timeparameterized query retrieves the actual result at the time when the query is issued, the expiry time of the result given the current motion of the query and database objects, and the change that causes the expiration. A continuous query retrieves tuples of the form <result, interval>, where each result is accompanied by a future interval, during which it is valid. We study timeparameterized and continuous versions of the most common spatial queries (i.e., window queries, nearest neighbors, spatial joins), proposing efficient processing algorithms and accurate cost models
Quality and Efficiency in High Dimensional Nearest Neighbor Search
"... Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sublinearly with the dataset size, regardless of the data and query distributi ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sublinearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. RigorousLSH ensures good quality of query results, but requires expensive space and query cost. Although adhocLSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive Btree (LSBtree) that enables fast highdimensional NN search with excellent quality. The combination of several LSBtrees leads to a structure called the LSBforest that ensures the same result quality as rigorousLSH, but reduces its space and query cost dramatically. The LSBforest also outperforms adhocLSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSBtree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSBtree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linearspace) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality.
GORDER: An Efficient Method for KNN Join Processing
 IN VLDB
, 2004
"... An important but very expensive primitive operation of highdimensional databases is the KNearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query results than the range similarity join. Su ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
An important but very expensive primitive operation of highdimensional databases is the KNearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query results than the range similarity join. Such an operation is useful for data mining and similarity search. In this