Results 11  20
of
649
VoronoiBased K Nearest Neighbor Search for Spatial Network Databases
 In VLDB
, 2004
"... A frequent type of query in spatial networks (e.g., road networks) is to find the K nearest neighbors (KNN) of a given query object. ..."
Abstract

Cited by 155 (16 self)
 Add to MetaCart
(Show Context)
A frequent type of query in spatial networks (e.g., road networks) is to find the K nearest neighbors (KNN) of a given query object.
Incremental Clustering for Mining in a Data Warehousing Environment
 PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB
, 1998
"... Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some ..."
Abstract

Cited by 137 (7 self)
 Add to MetaCart
Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWWlog database. Due to the densitybased nature of DBSCAN, the insertion or deletion of an object affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of IncrementalDBSCAN on a spatial database as well as on a WWWlog database is presented, demonstrating the efficiency of the proposed algorithm. IncrementalDBSCAN yields significant speedup factors over DBSCAN even for large numbers of daily updates in a data warehouse.
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
 In Proceedings of ICDE’99
, 1999
"... Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search ..."
Abstract

Cited by 117 (13 self)
 Add to MetaCart
(Show Context)
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 1015 dimensional spaces. This paper introduces the hybrid tree – a multidimensional data structure for indexing high dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (e.g., Rtree, SStree, SRtree) or a pure space partitioning (SP) one (e.g., KDBtree, hBtree); rather, it “combines ” positive aspects of the two types of index structures a single data structure to achieve search performance more scalable to high dimensionalities than either of the above techniques (hence, the name “hybrid”). Furthermore, unlike many data structures (e.g., distance based index structures like SStree, SRtree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on “real” high dimensional large size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DPbased and SPbased index mechanisms as well as linear scan at all dimensionalities for large sized databases. 1.
An investigation of practical approximate nearest neighbor algorithms
, 2004
"... This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate neares ..."
Abstract

Cited by 114 (4 self)
 Add to MetaCart
(Show Context)
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate kNN search algorithms on this structure. We show why these structures should be able to exploit the same randomprojectionbased approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
Learning to hash with binary reconstructive embeddings
 in Proc. NIPS, 2009
"... Fast retrieval methods are increasingly critical for many largescale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitl ..."
Abstract

Cited by 110 (1 self)
 Add to MetaCart
(Show Context)
Fast retrieval methods are increasingly critical for many largescale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinatedescent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing stateoftheart techniques. 1
On the Marriage of L_pnorms and Edit Distance
 IN VLDB
, 2004
"... Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shift ..."
Abstract

Cited by 97 (3 self)
 Add to MetaCart
Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are nonmetric. The first
Indexing Large Metric Spaces for Similarity Search Queries
, 1999
"... In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are propos ..."
Abstract

Cited by 92 (0 self)
 Add to MetaCart
(Show Context)
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the distance computations between objects of the data domain are expensive (such as high dimensional data), and the distance function used is metric. In this paper, we consider using distancebased index structures for similarity queries on large metric spaces. We elaborate on the approach of using reference points (vantage points) to partition the data space into spherical shelllike regions in a hierarchical manner. We introduce the multivantage point tree structure (mvptree) that uses more than one vantage points to partition the space into spherical cuts at each level. In answering similarity based queries, the mvptree also utilizes the precomputed (at con...
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data
 In Twelfth Conference on Uncertainty in Artificial Intelligence
, 2000
"... This paper is about metric data structures in highdimensional or nonEuclidean space that permit cached sufficient statistics accelerations of learning algorithms. ..."
Abstract

Cited by 91 (11 self)
 Add to MetaCart
This paper is about metric data structures in highdimensional or nonEuclidean space that permit cached sufficient statistics accelerations of learning algorithms.
Supporting Ranked Boolean Similarity Queries in MARS
, 1998
"... To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) [29]. In this paper, we concentrate on the retrieval subsystem of MARS and its support for contentbased queries over image ..."
Abstract

Cited by 88 (13 self)
 Add to MetaCart
(Show Context)
To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) [29]. In this paper, we concentrate on the retrieval subsystem of MARS and its support for contentbased queries over image databases. Contentbased retrieval techniques have been extensively studied for textual documents in the area of automatic information retrieval [40, 4]. This paper describes how these techniques can be adapted for ranked retrieval over image databases. Specifically, we discuss the ranking and retrieval algorithms developed in MARS based on the Boolean retrieval model and describe the results of our experiments that demonstrate the effectiveness of the developed model for image retrieval.
Fast and Robust Earth Mover’s Distances
"... We present a new algorithm for a robust family of Earth Mover’s Distances EMDs with thresholded ground distances. The algorithm transforms the flownetwork of the EMD so that the number of edges is reduced by an order of magnitude. As a result, we compute the EMD by an order of magnitude faster tha ..."
Abstract

Cited by 88 (6 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for a robust family of Earth Mover’s Distances EMDs with thresholded ground distances. The algorithm transforms the flownetwork of the EMD so that the number of edges is reduced by an order of magnitude. As a result, we compute the EMD by an order of magnitude faster than the original algorithm, which makes it possible to compute the EMD on large histograms and databases. In addition, we show that EMDs with thresholded ground distances have many desirable properties. First, they correspond to the way humans perceive distances. Second, they are robust to outlier noise and quantization effects. Third, they are metrics. Finally, experimental results on image retrieval show that thresholding the ground distance of the EMD improves both accuracy and speed. 1.