## On supervised density estimation techniques and their application to clustering (2007)

Venue: | IN: PROCS. OF THE 15TH ACM INTL. SYMPOSIUM ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS. (2007 |

Citations: | 7 - 6 self |

### BibTeX

@INPROCEEDINGS{Jiang07onsupervised,

author = {Dan Jiang and Christoph F. Eick and Chun-sheng Chen},

title = { On supervised density estimation techniques and their application to clustering},

booktitle = {IN: PROCS. OF THE 15TH ACM INTL. SYMPOSIUM ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS. (2007},

year = {2007},

publisher = {}

}

### OpenURL

### Abstract

The basic idea of traditional density estimation is to model the overall point density analytically as the sum of influence functions of the data points. However, traditional density estimation techniques only consider the location of a point. Supervised density estimation techniques, on the other hand, additionally consider a variable of interest that is associated with a point. Density in supervised density estimation is measured as the product of an influence function with the variable of interest. Based on this novel idea, a supervised density-based clustering named SCDE is introduced and discussed in detail. The SCDE algorithm forms clusters by associating data points with supervised density attractors which represent maxima and minima of a supervised density function. Results of experiments are presented that evaluate SCDE for hot spot discovery and co-location discovery in spatial datasets. Moreover, the benefits of the presented approach for generating thematic maps are briefly discussed.

### Citations

2164 |
Density Estimation for Statistics and Data Analysis
- Silverman
- 1986
(Show Context)
Citation Context ...-location discovery in spatial datasets. Moreover, the benefits of the presented approach for generating thematic maps are briefly discussed. I. INTRODUCTION The goal of density estimation techniques =-=[1]-=- is to model the distribution of the underlying population from the sample data collected. This technique measures the density at a point according to the impact of other points observed within its ne... |

1328 |
Finding Groups in Data: An Introduction to Cluster Analysis
- Kaufman, Rousseuw
- 1990
(Show Context)
Citation Context ...ection. a) b) Figure 5-1 Density Map of Ice Dataset The results of using SCDE for the dataset have been compared with two other algorithms: SPAM and SCMRG. SPAM (Supervised PAM) is a variation of PAM =-=[15]-=-. SPAM uses the fitness function q(x)—and not the mean square error— to determine the best cluster representatives as PAM does. SPAM starts its search with a randomly8 created set of representatives,... |

1109 |
Environmental Protection Agency
- S
- 1985
(Show Context)
Citation Context ...the Texas Ground Water Database (GWDB) [19]. A well is labeled as dangerous if its arsenic concentration level is above 10µg/l, the standard for drinking water by the10 Environment Protection Agency =-=[20]-=-. Earthquake and Volcano are spatial datasets obtained from Geosciences Department, University of Houston. The Volcano dataset uses severity of eruptions as the class label, whereas the Earthquake dat... |

1096 | A density-based algorithm for discovering clusters in large spatial databases with noise
- Ester, Kriegel, et al.
- 1996
(Show Context)
Citation Context ...clustering algorithms exist for spatial data mining [11]; among them, density-based algorithms [4, 6, 7, and 16] have been found to be most promising for discovering arbitrary shaped clusters. DBSCAN =-=[4]-=- uses a straightforward definition for a density function which counts the number of points within a predefined radius. Its main drawbacks are the need for parameter tuning and poor performance for da... |

207 | KEIM,." An efficient approach to clustering large multimedia databases with noise
- HINNEBURG
(Show Context)
Citation Context ...a density function which counts the number of points within a predefined radius. Its main drawbacks are the need for parameter tuning and poor performance for datasets having varying density. DENCLUE =-=[6]-=- uses kernel density estimation techniques and its clusters are formed by associating objects with maxima of the so-defined density function, relying on a steepest decent hill climbing procedure. DENC... |

165 |
A Spatial Scan Statistic
- Kulldorff
- 1997
(Show Context)
Citation Context ...he past both explicitly and implicitly. Because hot spots represent clusters with respect to spatial coordinates, their detection lies at the heart of spatial data mining and has been investigated in =-=[5, 8, 9, 12, 21]-=-. More explicitly, detection of hot spots using variable resolution approach [17] was investigated in order to minimize the effects of spatial super imposition. In [13] a region growing method for hot... |

133 | Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery - Sander, Ester, et al. - 1998 |

86 |
Geographic Data Mining and Knowledge Discovery (book chapter
- Miller
- 2007
(Show Context)
Citation Context ...he past both explicitly and implicitly. Because hot spots represent clusters with respect to spatial coordinates, their detection lies at the heart of spatial data mining and has been investigated in =-=[5, 8, 9, 12, 21]-=-. More explicitly, detection of hot spots using variable resolution approach [17] was investigated in order to minimize the effects of spatial super imposition. In [13] a region growing method for hot... |

54 | Density-based clustering of uncertain data - Kriegel, Pfeifle - 2005 |

52 | Spatial Data Mining: Progress and Challenges Survey Paper - Koperski, Adhikary, et al. - 1996 |

41 | CrimeStat: A Spatial Statistics Program for the Analysis of Crime Incident Locations (v - Levine - 2004 |

22 | Z..: Supervised Clustering – Algorithms and Benefits
- Eick, Zeidat, et al.
(Show Context)
Citation Context ... contour map according to ψ O for density values 10 and -10; d) Clustering results of SCDE The clustering results of SCDE algorithm were also compared with other supervised clustering algorithms SCEC =-=[14]-=-, SCAH and SCMRG [18]. 2 All three clustering algorithms seek to find clusters that maximize a reward-based fitness function q(x). SCEC is a K-means-style, representative-based clustering algorithm th... |

18 | Discovery of interesting regions in spatial datasets using supervised clustering
- Eick, Vaezian, et al.
- 2006
(Show Context)
Citation Context ...definition, somewhat similar, but less specific, to what we are using in the presented paper. This definition was applied to relational databases to find important nuggets of information. Finally, in =-=[18]-=- feature-based hot spots are defined in a similar sense as in this paper, but their discovery is limited to datasets with a single categorical variable. III. SUPERVISED DENSITY ESTIMATION Throughout t... |

17 |
A model for the hydrologic and climatic behavior of water on
- Clifford
- 1993
(Show Context)
Citation Context ...a continuous variable, and in Section B SCDE will be evaluated for a benchmark of categorical hot spot discovery problems. A. Experiments Involving Continuous Density Estimation It is widely believed =-=[2]-=- that a significant quantity of water resides in the Martian subsurface in form of ground ice. In this section, we will evaluate SCDE for a binary co-location mining problem that centers on identifyin... |

17 |
Cluster discovery techniques for exploratory spatial data analysis
- Murray, Estivill-Castro
- 1998
(Show Context)
Citation Context ...he past both explicitly and implicitly. Because hot spots represent clusters with respect to spatial coordinates, their detection lies at the heart of spatial data mining and has been investigated in =-=[5, 8, 9, 12, 21]-=-. More explicitly, detection of hot spots using variable resolution approach [17] was investigated in order to minimize the effects of spatial super imposition. In [13] a region growing method for hot... |

16 |
Evolutionary Hot Spots Data Mining: An Architecture for Exploring for Interesting Discoveries
- Williams
- 1999
(Show Context)
Citation Context ...h selects seed points first and then grows clusters from these seed points by adding neighbor points as long as a density threshold condition is satisfied. The definition of hot spots was extended in =-=[10]-=- to cover a set of entities that are of some particular, but crucial, importance to the domain of experts. This is a feature-based definition, somewhat similar, but less specific, to what we are using... |

13 | Clustering algorithms for spatial databases: A survey
- Kolatch
- 2001
(Show Context)
Citation Context ... this generalization leads to novel applications for spatial data mining. One direct application of supervised density estimation techniques is the generation of thematic maps for a selected variable =-=[11]-=-. Thematic maps can serve as a visual aid to the domain experts for quickly identifying the interesting regions for further investigations or for finding relations between different features. Another ... |

10 | Denclue 2.0: Fast clustering based on kernel density estimation
- Hinneburg, Gabriel
(Show Context)
Citation Context ...rnel density estimation techniques and its clusters are formed by associating objects with maxima of the so-defined density function, relying on a steepest decent hill climbing procedure. DENCLUE 2.0 =-=[21]-=- improves the original algorithm by using a different hill climbing technique. Methods of finding hot spots in spatial datasets have been investigated in the past both explicitly and implicitly. Becau... |

9 | Cluster Detection in Point Event Data Having Tendency Towards Spatially Repetitive
- Brimicombe
(Show Context)
Citation Context ...patial coordinates, their detection lies at the heart of spatial data mining and has been investigated in [5, 8, 9, 12, 21]. More explicitly, detection of hot spots using variable resolution approach =-=[17]-=- was investigated in order to minimize the effects of spatial super imposition. In [13] a region growing method for hot spots discovery was described, which selects seed points first and then grows cl... |

5 | Spatial data mining: clustering of hot spots and pattern recognition
- Tay, Hsu, et al.
- 2003
(Show Context)
Citation Context ...en investigated in [5, 8, 9, 12, 21]. More explicitly, detection of hot spots using variable resolution approach [17] was investigated in order to minimize the effects of spatial super imposition. In =-=[13]-=- a region growing method for hot spots discovery was described, which selects seed points first and then grows clusters from these seed points by adding neighbor points as long as a density threshold ... |

2 |
Geocomputation: A primer, Chapter building automated geographical analysis and explanation machines
- Openshaw
- 1998
(Show Context)
Citation Context |