## Finding Regional Co-location Patterns for Sets of Continuous Variables, under review

### Cached

### Download Links

Citations: | 11 - 9 self |

### BibTeX

@MISC{Eick_findingregional,

author = {Christoph F. Eick and Wei Ding},

title = {Finding Regional Co-location Patterns for Sets of Continuous Variables, under review},

year = {}

}

### OpenURL

### Abstract

This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without the need for discretization and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of colocation patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes chemical concentrations in Texas water wells centering on colocation patterns involving Arsenic. Our approach was able to identify known and unknown regional co-location patterns, and different sets of algorithm parameters lead to the characterization of arsenic distribution at different scales. Moreover, inconsistent co-location sets were found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes. Keywords spatial data mining, regional co-location mining, regional data mining, clustering, finding associations between continuous variables. 1.

### Citations

1328 |
Finding Groups in Data: An Introduction to Cluster Analysis
- Kaufman, Rousseuw
- 1990
(Show Context)
Citation Context ...ng for a set of “optimal” representatives; clusters are then created by assigning objects in the dataset to the closest representative. Popular prototype-based clustering algorithms are K-Medoids/PAM =-=[15]-=- and K-means [16]. CLEVER (CLustEring using representatiVEs and Randomized hill climbing) seeks to maximize the fitness function q(X). The algorithm (see Figure 2) starts with randomly selecting k’ re... |

842 | Least squares quantization in pcm
- Lloyd
- 1982
(Show Context)
Citation Context ...optimal” representatives; clusters are then created by assigning objects in the dataset to the closest representative. Popular prototype-based clustering algorithms are K-Medoids/PAM [15] and K-means =-=[16]-=-. CLEVER (CLustEring using representatiVEs and Randomized hill climbing) seeks to maximize the fitness function q(X). The algorithm (see Figure 2) starts with randomly selecting k’ representatives fro... |

596 | Efficient and Effective Clustering Methods for Spatial Data Mining
- Ng, Han
- 1994
(Show Context)
Citation Context ...lgorithm (see Figure 2) starts with randomly selecting k’ representatives from O—k’ is a parameter of the algorithm. It samples p solutions in the neighborhood of the current solution; unlike CLARANS =-=[17]-=- which picks the first best neighbor as the next solution, CLEVER evaluates all the p neighbors and picks the best among them. Neighboring solutions of the current solution are created using three ope... |

350 | Mining quantitative association rules in large relational tables
- SRIKANT, R
- 1996
(Show Context)
Citation Context ... the proposed methodology is not suitable for large datasets and relies on extensive human interactions. Most of the approaches to mine association rules in continuous datasets use discretization. In =-=[22]-=-, numerical attributes are discretized and then adjacent partitions are combined as necessary. This leads to information loss and can generate spurious rules. Aumann et al. [2] introduce numerical ass... |

86 | A Statistical Theory for Quantitative Association Rules
- Aumann, Lindell
- 1999
(Show Context)
Citation Context ...use discretization. In [22], numerical attributes are discretized and then adjacent partitions are combined as necessary. This leads to information loss and can generate spurious rules. Aumann et al. =-=[2]-=- introduce numerical association rules that support statistical predicates for continuous attributes, such as variance, and algorithms that mine such rules. In [3], rank correlation is used to mine as... |

64 |
Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27:286–306
- Ord, Getis
- 1995
(Show Context)
Citation Context ... variable resolution approach was investigated in order to minimize the effects of spatial superposition. The definition of hot spots was extended in [18] using circular zones for multiple variables. =-=[13, 21]-=- propose a popular method to find hot spots in spatial datasets relying on the G* Statistic. The G* Statistic detects local pockets of spatial association. The value of G* depends on an a priori given... |

56 | Huang.Discovering spatial colocation patterns
- Shekhar, Y
- 2001
(Show Context)
Citation Context ...ated on discovering collocation patterns with respect to categorical features, which identify sets of classes whose instances co-occur in geographical proximity with high frequency. A classic example =-=[19]-=- of such a relationship is the co-location of two types of animals, the Nile crocodile and the Egyptian plover, which is traced by domain scientists to their symbiotic relationship. Figure 1. Regional... |

55 |
Mining Quantitative Association Rules
- Srikant, Agrawal
- 1996
(Show Context)
Citation Context ...o non-spatial basket datasets. Finding Associations between Continuous Attributes. Most of the approaches to mine association rules in datasets containing continuous attributes use discretization. In =-=[26]-=-, numerical attributes are discretized and then adjacent partitions are combined as necessary. This leads to information loss and can generate spurious rules. Aumann et al. [3] introduce numerical ass... |

50 |
Prospective time periodic geographical disease surveillance using a scan statistic
- Kulldorff
(Show Context)
Citation Context ...tistics. In [4] the detection of hot spots using a variable resolution approach was investigated in order to minimize the effects of spatial superposition. The definition of hot spots was extended in =-=[18]-=- using circular zones for multiple variables. [13, 21] propose a popular method to find hot spots in spatial datasets relying on the G* Statistic. The G* Statistic detects local pockets of spatial ass... |

28 | Finding Localized Associations in Market Basket Data
- Aggarwal, Procopiuc, et al.
(Show Context)
Citation Context ...other hand, as we will explain later in more detail, centers on discovering regions and regional co-location patterns whose scope is a subspace of the whole dataset. Localized association rule mining =-=[2]-=- takes a similar approach to ours, but it discovers association rules that hold in local clustered basket data. Thus, their discovery is limited to non-spatial basket datasets. Finding Associations be... |

22 | Z..: Supervised Clustering – Algorithms and Benefits
- Eick, Zeidat, et al.
(Show Context)
Citation Context ...region discovery algorithms (four representativebased, three agglomerative, one divisive, and one density-based region discovery algorithm) have already been designed and implemented by our past work =-=[4, 7, 10]-=-. Two of those algorithms, a novel unpublished, prototype-based clustering algorithm named CLEVER and a grid-based clustering named 1 For each pair of objects belonging to the same region there has to... |

22 |
A review of the source, behavior and distribution of arsenic in natural waters." Applied Geochemistry 17
- Smedley, Kinniburgh
- 2002
(Show Context)
Citation Context ...ids (TDS) and Well Depth (WD) are added. Those particular elements were chosen among the number of chemical elements available because of similar geochemical behavior that is, travel together (Mo, V) =-=[20]-=-, because those parameters could point out mobilizing mechanisms (Cl - , SO 4 2- , TDS, well depth), or because they could suggest the ultimate origin of arsenic (F - , B, SiO 2). The created dataset ... |

18 | Discovery of interesting regions in spatial datasets using supervised clustering
- Eick, Vaezian, et al.
- 2006
(Show Context)
Citation Context ...by interestingness thus providing a domain expert with pertinent information. Related Work. Our approach employs a region discovery framework whose initial version has been developed in our past work =-=[8,9]-=-. Shekhar et al. [19] discuss several interesting approaches to mine co-location patterns with respect to a given set of events. Huang et al. [12] center on co-location mining involving rare events an... |

17 |
Local spatial statistics: An overview
- Getis, Ord
- 1996
(Show Context)
Citation Context ...atterns, whose scope is the whole dataset. Our approach, on the other hand, as we will explain later in more detail, centers on discovering regional co-location patterns. Localized spatial statistics =-=[11]-=- also analyzes regional characteristics in spatial datasets. However, the proposed methodology is not suitable for large datasets and relies on extensive human interactions. Most of the approaches to ... |

16 | Mining rank-correlated sets of numerical attributes
- Calders, Goethals, et al.
- 2006
(Show Context)
Citation Context ...nerate spurious rules. Aumann et al. [2] introduce numerical association rules that support statistical predicates for continuous attributes, such as variance, and algorithms that mine such rules. In =-=[3]-=-, rank correlation is used to mine associations between numerical attributes. Basically, continuous attributes are transformed to ordinal attributes, and a method is proposed to find sets of numerical... |

16 |
A joinless approach for mining spatial colocation patterns
- Yoo, Shekhar
- 2006
(Show Context)
Citation Context ...y. Shekhar et al. discuss several interesting approaches to mine co-location patterns, which are subsets of Boolean spatial features whose instances are frequently located together in close proximity =-=[23, 28, 29]-=-. Huang et al. proposed co-location mining involving rare events [14]. In [15], Huang and Zhang explored the relations between clustering and co-location mining. Instead of clustering spatial objects,... |

16 |
Local spatial statistics: An overview. In: Spatial Analysis: Modelling in a GIS Environment
- Getis, Ord
- 1996
(Show Context)
Citation Context ...e approach first uses a clustering algorithm to find correlation clusters, and then derive equations describing the linear space approximating each cluster’s data points. Localized spatial statistics =-=[8]-=- also analyzes regional characteristics in spatial datasets. However, the proposed methodology is not suitable for large datasets and relies on extensive human interactions. Finally, Klösgen and May [... |

15 | Mosaic: A proximity graph approach to agglomerative clustering
- Choo, Jiamthapthaksin, et al.
- 2007
(Show Context)
Citation Context ...region discovery algorithms (four representativebased, three agglomerative, one divisive, and one density-based region discovery algorithm) have already been designed and implemented by our past work =-=[4, 7, 10]-=-. Two of those algorithms, a novel unpublished, prototype-based clustering algorithm named CLEVER and a grid-based clustering named 1 For each pair of objects belonging to the same region there has to... |

13 | H.: Mining co-location patterns with rare events from spatial data sets. Geoinformatica 10(3
- Huang, Pei, et al.
- 2006
(Show Context)
Citation Context ... initial version has been developed in our past work [8,9]. Shekhar et al. [19] discuss several interesting approaches to mine co-location patterns with respect to a given set of events. Huang et al. =-=[12]-=- center on co-location mining involving rare events and introduce a novel measure of interestingness for this purpose. In [13] Huang and Zhang explore the relationships between clustering and co-locat... |

12 | Discovering of interesting regions in spatial data sets using supervised cluster
- Eick, Vaezian, et al.
- 2006
(Show Context)
Citation Context ...ion, we are interested in the development of frameworks and algorithms that find interesting regions in spatial and spatio-temporal datasets. The presented framework has originally been introduced in =-=[10, 11]-=-, and will be generalized in this section to mine datasets that contain multiple continuous variables. A novel measure of interestingness for mining co-locations involving continuous attributes that i... |

11 | A framework for regional association rule mining in spatial datasets
- Ding, Eick, et al.
- 2006
(Show Context)
Citation Context ...region discovery algorithms (four representativebased, three agglomerative, one divisive, and one density-based region discovery algorithm) have already been designed and implemented by our past work =-=[4, 7, 10]-=-. Two of those algorithms, a novel unpublished, prototype-based clustering algorithm named CLEVER and a grid-based clustering named 1 For each pair of objects belonging to the same region there has to... |

10 | Deriving quantitative models for correlation clusters
- Achtert, Böhm, et al.
- 1993
(Show Context)
Citation Context ...between numerical attributes. Basically, continuous attributes are transformed to ordinal attributes, and a method is proposed to find sets of numerical attributes with high attribute values. Achtert =-=[1]-=- and Jaroszewicz [14] propose different methods for deriving equations describing relationships between continuous variables in datasets. Contributions. First, a novel regional co-location mining fram... |

9 | Cluster Detection in Point Event Data Having Tendency Towards Spatially Repetitive
- Brimicombe
(Show Context)
Citation Context ...e framework allows the actual clustering task to be performed by a variety of different algorithms. Related Work. The relevant research spans three areas: Hot Spot Discovery in Spatial Statistics. In =-=[4]-=- the detection of hot spots using a variable resolution approach was investigated in order to minimize the effects of spatial superposition. The definition of hot spots was extended in [18] using circ... |

8 |
J.S.: A framework for discovering co-location patterns in data sets with extended spatial objects
- Xiong, Shekhar, et al.
- 2004
(Show Context)
Citation Context ...y. Shekhar et al. discuss several interesting approaches to mine co-location patterns, which are subsets of Boolean spatial features whose instances are frequently located together in close proximity =-=[23, 28, 29]-=-. Huang et al. proposed co-location mining involving rare events [14]. In [15], Huang and Zhang explored the relations between clustering and co-location mining. Instead of clustering spatial objects,... |

6 | On the Relationships between Clustering and Spatial Co-location Pattern Mining
- Huang, Zhang
(Show Context)
Citation Context ... co-location patterns with respect to a given set of events. Huang et al. [12] center on co-location mining involving rare events and introduce a novel measure of interestingness for this purpose. In =-=[13]-=- Huang and Zhang explore the relationships between clustering and co-location mining. Instead of clustering spatial objects, their features are clustered using a proximity function that is designed to... |

5 | Towards region discovery in spatial datasets
- Ding, Jiamthapthaksin, et al.
- 2008
(Show Context)
Citation Context ...by interestingness thus providing a domain expert with pertinent information. Related Work. Our approach employs a region discovery framework whose initial version has been developed in our past work =-=[8,9]-=-. Shekhar et al. [19] discuss several interesting approaches to mine co-location patterns with respect to a given set of events. Huang et al. [12] center on co-location mining involving rare events an... |

3 | Cancer risks from arsenic in drinking water
- Smith, Hopenhayn-Rich, et al.
- 1992
(Show Context)
Citation Context ...used in this case study are created using the Groundwater database (GWDB) maintained by the Texas Water Development Board [23]. Longsterm exposure to low level concentrations of Arsenic causes cancer =-=[21]-=-. Figure 3 shows various aquifers and arsenic pollution sites on the map of Texas reported by Texas Commission on Environmental Quality (TCEQ). It is important to understand factors that cause Arsenic... |

2 |
et al. 2005. Evaluation of Arsenic Contamination in Texas. Technical report prepared for TCEQ, under contract no
- Scanlon, Nicot
(Show Context)
Citation Context ... for at what level a granularity.sArsenic water pollution is a serious problem for Texas and its causes are complex and frequently difficult to explain, particularly for wells in the Ogallala aquifer =-=[18]-=-. A large number of possible explanations exist what causes high levels of arsenic concentrations to occur. Therefore, scientists face the problem to decide which hypotheses from a large set of hypoth... |

2 | Cancer Risks from Arsenic - Wood, Smith - 1992 |

1 | Minimum Variance Associations— Discovering Relationships in Numerical Data
- Jaroszewicz
- 2008
(Show Context)
Citation Context ...tributes. Basically, continuous attributes are transformed to ordinal attributes, and a method is proposed to find sets of numerical attributes with high attribute values. Achtert [1] and Jaroszewicz =-=[14]-=- propose different methods for deriving equations describing relationships between continuous variables in datasets. Contributions. First, a novel regional co-location mining framework is introduced t... |

1 |
Deriving quantitative models for correlation cluste
- Achtert, Bőhm, et al.
- 2006
(Show Context)
Citation Context ...ions between numerical attributes. Basically, continuous attributes are transformed to ordinal attributes, and a method is proposed to find sets of numerical attributes with high attribute values. In =-=[1]-=-, an interesting method is presented for deriving equations describing clusters containing numerical data. The approach first uses a clustering algorithm to find correlation clusters, and then derive ... |