Results 1  10
of
31
A densitybased algorithm for discovering clusters in large spatial databases with noise
, 1996
"... Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clu ..."
Abstract

Cited by 1207 (62 self)
 Add to MetaCart
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The wellknown clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a densitybased notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the wellknown algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.
SpatioTemporal Data Types: An Approach to Modeling and Querying Moving Objects in Databases
, 1999
"... Spatiotemporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the ext ..."
Abstract

Cited by 151 (37 self)
 Add to MetaCart
(Show Context)
Spatiotemporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions. We propose a new line of research where moving points and moving regions are viewed as threedimensional (2D space + time) or higherdimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, objectoriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatiotemporal data types to play a similarly fundamental role for spatiotemporal databases as spatial data types have played for sp...
Incremental Clustering for Mining in a Data Warehousing Environment
 PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB
, 1998
"... Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some ..."
Abstract

Cited by 116 (7 self)
 Add to MetaCart
Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWWlog database. Due to the densitybased nature of DBSCAN, the insertion or deletion of an object affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of IncrementalDBSCAN on a spatial database as well as on a WWWlog database is presented, demonstrating the efficiency of the proposed algorithm. IncrementalDBSCAN yields significant speedup factors over DBSCAN even for large numbers of daily updates in a data warehouse.
SpatioTemporal Data Handling with Constraints
, 1998
"... Most spatial information systems are limited to a fixed dimension (generally 2) which is not extensible. On the other hand, the emerging paradigm of constraint databases allows the representation of data of arbitrary dimension, together with abstract query languages. The complexity of evaluating que ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
Most spatial information systems are limited to a fixed dimension (generally 2) which is not extensible. On the other hand, the emerging paradigm of constraint databases allows the representation of data of arbitrary dimension, together with abstract query languages. The complexity of evaluating queries though might be costly if the dimension of the objects is really arbitrary. In this paper, we present a data model, based on linear constraints, dedicated to the representation and manipulation of multidimensional data. In order to preserve a low complexity for query evaluation, we restrict the orthographic dimension of an object O, defined as the dimension of the components O1 ; :::; On such that O = O1 \Theta \Delta \Delta \Delta \Theta On . This allows to process queries independently on each component, therefore achieving a satisfying tradeoff between design simplicity, expressive power of the query language and efficiency of query evaluation. We illustrate these concepts in the co...
Algorithms for characterization and trend detection in spatial databases
 in Proc. 4th International Conf. on Knowledge Discovery and Data Mining
, 1998
"... The number and the size of spatial databases, e.g. for geomarketing, traffic control or environmental studies, are rapidly growing which results in an increasing need for spatial data mining. In this paper, we present new algorithms for spatial characterization and spatial trend analysis. For spatia ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
The number and the size of spatial databases, e.g. for geomarketing, traffic control or environmental studies, are rapidly growing which results in an increasing need for spatial data mining. In this paper, we present new algorithms for spatial characterization and spatial trend analysis. For spatial characterization it is important that class membership of a database object is not only determined by its nonspatial attributes but also by the attributes of objects in its neighborhood. In spatial trend analysis, patterns of change of some nonspatial attributes in the neighborhood of a database object are determined. We present several algorithms for these tasks. These algorithms were implemented within a general framework for spatial data mining providing a small set of database primitives on top of a commercial spatial database management system. A performance evaluation using a real geographic database demonstrates the effectiveness of the proposed algorithms. Furthermore, we show how the algorithms can be combined to discover even more interesting spatial knowledge.
Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support
 In Proc.ofInt.Conf.on Databases in Office, Engineering and Science
, 1999
"... : Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implement ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
: Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implementation of these concepts will allow a tight integration of spatial data mining algorithms with a spatial database management system. This will speed up both, the development and the execution of spatial data mining algorithms. In this paper, we define neighborhood graphs and paths and a small set of database primitives for their manipulation. We show that typical spatial data mining algorithms are well supported by the proposed basic operations. For finding significant spatial patterns, only certain classes of paths "leading away" from a starting object are relevant. We discuss filters allowing only such neighborhood paths which will significantly reduce the search space for spatial data mini...
DensityConnected Sets and their Application for Trend Detection in Spatial Databases
 Proc. 3rd Znt. Conf on Knowledge Discovery and Data Mining, 1015, Menlo Park
, 1997
"... Several clustering algorithms have been proposed for class identification in spatial databases such as earth observation databases. The effectivity of the wellknown algorithms such as DBSCAN, however, is somewhat limited because they do not fully exploit the richness of the different types of data ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Several clustering algorithms have been proposed for class identification in spatial databases such as earth observation databases. The effectivity of the wellknown algorithms such as DBSCAN, however, is somewhat limited because they do not fully exploit the richness of the different types of data contained in a spatial database. In this paper, we introduce the concept of densityconnected sets and present a significantly generalized version of DBSCAN. The major properties of this algorithm are as follows: (1) any symmetric predicate can be used to define the neighborhood of an object allowing a natural definition in the case of spatially extended objects such as polygons, and (2) the cardinality function for a set of neighboring objects may take into account the nonspatial attributes of the objects as a means of assigning application specific weights. Densityconnected sets can be used as a basis to discover trends in a spatial database. We define trends in spatial databases and show how to apply the generalized DBSCAN algorithm for the task of discovering such knowledge. To demonstrate the practical impact of our approach, we performed experiments on a geographical information system on Bavaria which is representative for a broad class of spatial databases.
A Fast Parallel Clustering Algorithm for Large Spatial Databases
 DATA MINING AND KNOWLEDGE DISCOVERY, 3, 263–290
, 1999
"... The clustering algorithm DBSCAN relies on a densitybased notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘sharednothing’ architecture with multiple compu ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
The clustering algorithm DBSCAN relies on a densitybased notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘sharednothing’ architecture with multiple computers interconnected through a network. A fundamental component of a sharednothing system is its distributed data structure. We introduce the dR∗tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
Algorithms and Applications for Spatial Data Mining
, 2001
"... Introduction Due to the computerization and the advances in scientific data collection we are faced with a large and continuously growing amount of data which makes it impossible to interpret all this data manually. Therefore, the development of new techniques and tools that support the human in tr ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Introduction Due to the computerization and the advances in scientific data collection we are faced with a large and continuously growing amount of data which makes it impossible to interpret all this data manually. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area "knowledge discovery in databases". Knowledge discovery in databases (KDD) has been defined as the nontrivial process of discovering valid, novel, potentially useful and ultimately understandable patterns from data, a pattern is an expression in some language describing a subset of the data or a model applicable to that subset (Fayyad et al., 1996). The process of KDD is interactive and iterative, involving several steps such as data selection, data reduction, data mining, and the evaluation of the data mining results. The heart of the process, however, is the data mining step
dedale, A Spatial Constraint Database
, 1997
"... This paper presents a first prototype of a constraint database for spatial information, dedale. Implemented on top of the O2 DBMS, data is stored in an objectoriented framework, with spatial data represented using linear constraints over a dense domain. The query language is the standard OQL, with ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
This paper presents a first prototype of a constraint database for spatial information, dedale. Implemented on top of the O2 DBMS, data is stored in an objectoriented framework, with spatial data represented using linear constraints over a dense domain. The query language is the standard OQL, with special functions for constraint solving and geometric operations. A simple geographical application from the French Institute for Geography, IGN, is running on dedale. The data initially in vector mode was loaded into the database after a translation to constraint representation. Although it is too early to speak of performance since not all operations have been optimized yet, our experience with dedale demonstrates already the advantages of the constraint approach for spatial manipulation.