Results 11 - 20
of
22
The ANN-tree: An index for efficient approximate nearest neighbor search
, 2001
"... In this paper we explore the problem of approximate nearest neighbor searches. We propose an index structure, the ANN-tree (approximate nearest neighbor tree) to solve this problem. The ANN-tree supports high accuracy nearest neighbor search. The actual nearest neighbor of a query point can usually ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we explore the problem of approximate nearest neighbor searches. We propose an index structure, the ANN-tree (approximate nearest neighbor tree) to solve this problem. The ANN-tree supports high accuracy nearest neighbor search. The actual nearest neighbor of a query point can usually be found in the rst leaf page accessed. The accuracy increases to near 100% if a second page is accessed. This is not achievable via traditional indexes. Even if an exact nearest nearest neighbor query is desired, the ANN-tree is demonstrably more efficient than existing structures like the R*-tree. This makes the ANN-tree a preferable index structure for both exact and approximate nearest neighbor searches. We present the index in detail and provide experimental results on both real and synthetic data sets.
Indexing Problems in Spatiotemporal Databases
, 2000
"... INDEXING PROBLEMS IN SPATIOTEMPORAL DATABASES by George N. Kollios Advisor: Vassilis Tsotras Co-Advisor: Alex Delis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Computer Science) June 2000 Spatiotemporal databases manage spatial objects that ch ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
INDEXING PROBLEMS IN SPATIOTEMPORAL DATABASES by George N. Kollios Advisor: Vassilis Tsotras Co-Advisor: Alex Delis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Computer Science) June 2000 Spatiotemporal databases manage spatial objects that change positions and/or extents over time. Examples include traffic surveillance data, climate and land cover data, demographic data and multimedia applications (animated movies). Since these databases are large in size, it is important to design efficient indexing schemes that can access and explore them.
Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data
"... Reverse nearest neighbor queries are useful in identifying objects that are of significant influence or importance. Existing methods either rely on pre-computation of nearest neighbor distances, do not scale well with high dimensionality, or do not produce exact solutions. In this work we motivate a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Reverse nearest neighbor queries are useful in identifying objects that are of significant influence or importance. Existing methods either rely on pre-computation of nearest neighbor distances, do not scale well with high dimensionality, or do not produce exact solutions. In this work we motivate and investigate the problem of reverse nearest neighbor search on high dimensional, multimedia data. We propose exact and approximate algorithms that do not require pre-computation of nearest neighbor distances, and can potentially prune off most of the search space. We demonstrate the utility of reverse nearest neighbor search by showing how it can help improve the classification accuracy. 1
The SH-Tree: A Novel and Flexible Super Hybrid Index Structure for Similarity Search on Multidimensional Data †
"... Approaches to indexing and searching feature vectors are an indispensable factor to support similarity search effectively and efficiently. Such feature vectors extracted from real world objects are usually presented in the form of multidimensional data. As a result, many multidimensional data index ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Approaches to indexing and searching feature vectors are an indispensable factor to support similarity search effectively and efficiently. Such feature vectors extracted from real world objects are usually presented in the form of multidimensional data. As a result, many multidimensional data index techniques have been widely introduced to the research community. These index techniques are categorized into two main classes: SP (space partitioning)/KD-tree-based and DP (data partitioning)/R-tree-based. Although there are a variety of “mixed ” index techniques, which try to inherit positive aspects from more than one index technique, the number of techniques that are derived from these two main classes is just a few. In this paper, we introduce such a “mixed ” index, the SH-tree: a novel and flexible super hybrid index structure for multidimensional data. Theoretical analyses indicate that the SH-tree is a good combination of the two index technique families with respect to both the presentation and search algorithms. It overcomes shortcomings and makes use of their positive aspects to facilitate efficient similarity searches in multidimensional data spaces. Empirical experiment results with both uniformly distributed and real data sets will confirm our theoretical analyses.
Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments
"... Wireless broadcast is an efficient way for information dissemination due to its good scalability [10]. Existing works typically assume mobile devices, such as cell phones and PDAs, can access only one channel at a time. In this paper, we consider a scenario of near future where a mobile device has t ..."
Abstract
- Add to MetaCart
Wireless broadcast is an efficient way for information dissemination due to its good scalability [10]. Existing works typically assume mobile devices, such as cell phones and PDAs, can access only one channel at a time. In this paper, we consider a scenario of near future where a mobile device has the ability to process queries using information simultaneously received from multiple channels. We focus on the query processing of the transitive nearest neighbor (TNN) search [19]. Two TNN algorithms developed for a single broadcast channel environment are adapted to our new broadcast enviroment. Based on the obtained insights, we propose two new algorithms, namely Double-NN-Search and Hybrid-NN-Search algorithms. Further, we develop an optimization technique, called approximate-NN (ANN), to reduce the energy consumption in mobile devices. Finally, we conduct a comprehensive set of experiments to validate our proposals. The result shows that our new algorithms provide a better performance than the existing ones and the optimization technique efficiently reduces energy consumption. Keywords Multi-Channel access, transitive nearest neighbor, query processing,
The Fractal Dimension Making Similarity Queries More Efficient
"... This paper presents a new algorithm to answer k-nearest neighbor queries called the Fractal k-Nearest Neighbor (k-NNF ()). This algorithm takes advantage of the fractal dimension of the dataset under scan to estimate a suitable radius to shrinks a query that retrieves the k-nearest neighbors of a qu ..."
Abstract
- Add to MetaCart
This paper presents a new algorithm to answer k-nearest neighbor queries called the Fractal k-Nearest Neighbor (k-NNF ()). This algorithm takes advantage of the fractal dimension of the dataset under scan to estimate a suitable radius to shrinks a query that retrieves the k-nearest neighbors of a query object. k-NN() algorithms starts searching for elements at any distance from the query center, progressively reducing the allowed distance used to consider elements as worth to analyze. If a proper radius can be set to start the process, a significant reduction in the number of distance calculations can be achieved. The experiments performed with real and synthetic datasets over the access method Slim-tree, have shown that the efficiency of our approach makes the total processing time to drop up to 50%, while requires 25 % less distance calculations.
On the Generalization of Nearest Neighbor Queries
"... Nearest neighbor queries on R-trees use a number of pruning techniques to improve the search. We examine three common 1-nearest neighbor pruning strategies and generalize them to #-nearest neighbors. This generalization clears up a number of prior misconceptions. Specifically, we show that the ..."
Abstract
- Add to MetaCart
Nearest neighbor queries on R-trees use a number of pruning techniques to improve the search. We examine three common 1-nearest neighbor pruning strategies and generalize them to #-nearest neighbors. This generalization clears up a number of prior misconceptions. Specifically, we show that the generalization of one pruning technique, referred to as strategy 2, is non-trivial and requires the introduction of a new algorithm we call promise-pruning. In addition, we show that, contrary to other claims, applying this generalized strategy to #-nearest neighbor queries results in a theoretically better search. This discovery is reinforced with empirical results showing the success of promise-pruning on both random and real-world data. 1
Self-Tuning Cost Modeling of User-Defined Functions in an Object-Relational DBMS
- ACM Transactions on Database Systems
, 2005
"... This paper proposes a new approach based on the recent trend of self-tuning DBMS, by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limite ..."
Abstract
- Add to MetaCart
This paper proposes a new approach based on the recent trend of self-tuning DBMS, by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching e#ect). In this paper we first provide a set of guidelines for developing techniques that meet these challenges while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multi-dimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems---Query Processing General Terms: cost modeling, object relational DBMS, query optimization, self-tuning Additional Key Words and Phrases: K-nearest neighbors, quadtree, self-tuning 1.
The SH-Tree: A Novel and Flexible Super Hybrid Index Structure for Similarity Search on Multidimensional Data
, 2006
"... Approaches to indexing and searching feature vectors are an indispensable factor to support similarity search effectively and efficiently. Such feature vectors extracted from real world objects are usually presented in the form of multidimensional data. As a result, many multidimensional data index ..."
Abstract
- Add to MetaCart
Approaches to indexing and searching feature vectors are an indispensable factor to support similarity search effectively and efficiently. Such feature vectors extracted from real world objects are usually presented in the form of multidimensional data. As a result, many multidimensional data index techniques have been widely introduced to the research community. These index techniques are categorized into two main classes: SP (space partitioning)/KD-tree-based and DP (data partitioning)/R-tree-based. Although there are a variety of “mixed ” index techniques, which try to inherit positive aspects from more than one index technique, the number of techniques that are derived from these two main classes is just a few. In this paper, we introduce such a “mixed” index, the SH-tree: a novel and flexible super hybrid index structure for multidimensional data. Theoretical analyses indicate that the SH-tree is a good combination of the two index technique families with respect to both the presentation and search algorithms. It overcomes shortcomings and makes use of their positive aspects to facilitate efficient similarity searches in multidimensional data spaces. Empirical experiment results with both uniformly distributed and real data sets will confirm our theoretical analyses.
An Algorithm for Multi-way Distance Join Query
"... Abstract-This paper presents K-DJQ algorithms for solving multi-way distance join query, which finds the K n-tuples from n spatial datasets that have the smallest distance value according to query graph. R-tree is used as index structure for each dataset. K-DJQ algorithm is recursive non-incremental ..."
Abstract
- Add to MetaCart
Abstract-This paper presents K-DJQ algorithms for solving multi-way distance join query, which finds the K n-tuples from n spatial datasets that have the smallest distance value according to query graph. R-tree is used as index structure for each dataset. K-DJQ algorithm is recursive non-incremental approach following depth-first search strategy and synchronously traverses all R-trees, which returns the K n-tuples of the result all together at the end of the algorithm without producing any intermediate result. In addition, distance-based plane-sweep technique is used as optimization techniques for K-DJQ to reduce the total query processing time. Finally, performance and accuracy of K-DJQ algorithm are evaluated in terms of different K value and the number of datasets through experimentation. I.

