## Finding Regional Co-location Patterns for Sets of Continuous Variables, under review

### Abstract

This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without the need for discretization and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of colocation patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes chemical concentrations in Texas water wells centering on colocation patterns involving Arsenic. Our approach was able to identify known and unknown regional co-location patterns, and different sets of algorithm parameters lead to the characterization of arsenic distribution at different scales. Moreover, inconsistent co-location sets were found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes. Keywords spatial data mining, regional co-location mining, regional data mining, clustering, finding associations between continuous variables. 1.

