Results 1 - 10
of
96
A density-based algorithm for discovering clusters in large spatial databases with noise
, 1996
"... Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clu ..."
Abstract
-
Cited by 797 (55 self)
- Add to MetaCart
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract
-
Cited by 508 (3 self)
- Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces
, 1998
"... For similarity search in high-dimensional vector spaces (or `HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning. However, the performance of these methods generally degrades as dimensionality increases. Altho ..."
Abstract
-
Cited by 413 (12 self)
- Add to MetaCart
For similarity search in high-dimensional vector spaces (or `HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning. However, the performance of these methods generally degrades as dimensionality increases. Although this phenomenon---known as the `dimensional curse'---is well known, little or no quantitative analysis of the phenomenon is available. In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs. We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10. Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VA-file, and report on an ...
Efficient Processing of Spatial Joins Using R-Trees
, 1993
"... Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an a ..."
Abstract
-
Cited by 287 (12 self)
- Add to MetaCart
Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an algorithm that consists of three phases: task creation, task assignment and parallel task execu-tion. In order to reduce CPU- and I/O-cost, the three phases are processed in a fashion that pre-serves spatial locality. Dynamic load balancing is achieved by splitting tasks into smaller ones and reassigning some of the smaller tasks to idle processors. In an experimental performance compar-ison, we identify the advantages and disadvantages of several variants of our algorithm. The most efficient one shows an almost optimal speed-up under the assumption that the number of disks is sufficiently large. Topics: spatial database systems, parallel database systems 1
Generalized Search Trees for Database Systems
- IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
Abstract
-
Cited by 186 (19 self)
- Add to MetaCart
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+-trees and R-trees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. We also present a preliminary performance analysis of RD-trees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
Discovery of spatial association rules in geographic information databases
, 1995
"... Abstract. Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this paper, an e cient method for mining strong spatial association rules in geographic information database ..."
Abstract
-
Cited by 154 (14 self)
- Add to MetaCart
Abstract. Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this paper, an e cient method for mining strong spatial association rules in geographic information databases is proposed and studied. A spatial association rule is a rule indicating certain association relationship among a set of spatial and possibly some nonspatial predicates. A strong rule indicates that the patterns in the rule have relatively frequent occurrences in the database and strong implication relationships. Several optimization techniques are explored, including a two-step spatial computation technique (approximate computation on large sets, and re ned computations on small promising patterns), shared processing in the derivation of large predicates at multiple concept levels, etc. Our analysis shows that interesting association rules can be discovered e ciently in large spatial databases. 1
Partition Based Spatial-Merge Join
, 1996
"... This paper describes PBSM (Partition Based Spatial--Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are interme ..."
Abstract
-
Cited by 150 (8 self)
- Add to MetaCart
This paper describes PBSM (Partition Based Spatial--Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamically redistributed. The PBSM algorithm partitions the inputs into manageable chunks, and joins them using a computational geometry based plane--sweeping technique. This paper also presents a performance study comparing the the traditional indexed nested loops join algorithm, a spatial join algorithm based on joining spatial indices, and the PBSM algorithm. These comparisons are based on complete implementations of these algorithms in Paradise, a database system for handling GIS applications. Using real data sets, the performance study examines the behavior of these spatial join algorithms in a vari...
Incremental Distance Join Algorithms for Spatial Databases
, 1998
"... Two new spatial join operations, distance join and distance semijoin, are introduced where the join output is ordered by the distance between the spatial attribute values of the joined tuples. Incremental algorithms are presented for computing these operations, which can be used in a pipelined fashi ..."
Abstract
-
Cited by 97 (9 self)
- Add to MetaCart
Two new spatial join operations, distance join and distance semijoin, are introduced where the join output is ordered by the distance between the spatial attribute values of the joined tuples. Incremental algorithms are presented for computing these operations, which can be used in a pipelined fashion, thereby obviating the need to wait for their completion when only a few tuples are needed. The algorithms can be used with a large class of hierarchical spatial data structures and arbitrary spatial data types in any dimensions. In addition, any distance metric may be employed. A performance study using Rtrees shows that the incremental algorithms outperform non-incremental approaches by an order of magnitude if only a small part of the result is needed, while the penalty, if any, for the incremental processing is modest if the entire join result is required.
Topological Relations in the World of Minimum Bounding Rectangles: A Study with R-Trees
, 1995
"... Recent developments in spatial relations have led to their use in numerous applications involving spatial databases. This paper is concerned with the retrieval of topological relations in Minimum Bounding Rectangle-based data structures. We study the topological information that Minimum Bounding Rec ..."
Abstract
-
Cited by 96 (29 self)
- Add to MetaCart
Recent developments in spatial relations have led to their use in numerous applications involving spatial databases. This paper is concerned with the retrieval of topological relations in Minimum Bounding Rectangle-based data structures. We study the topological information that Minimum Bounding Rectangles convey about the actual objects they enclose, using the concept of projections. Then we apply the results to R-trees and their variations, R-trees and R*-trees in order to minimise disk accesses for queries involving topological relations. We also investigate queries that involve complex spatial conditions in the form of disjunctions and conjunctions and we discuss possible extensions.
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification
- Proc. 4th Int. Symp. on Large Spatial Databases
, 1995
"... . Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, mos ..."
Abstract
-
Cited by 84 (9 self)
- Add to MetaCart
. Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database systems. In this paper, we address the task of class identification in spatial databases using clustering techniques. We put special emphasis on the integration of the discovery methods with the DB interface, which is crucial for the efficiency of KDD on large databases. The key to this integration is the use of a well-known spatial access method, the R*-tree. The focusing component of a KDD system determines which parts of the database are relevant for the knowledge discovery task. We present several strategies for focusing: selecting representatives from a spatial database, focusing on the re...

