Results 1 -
5 of
5
Finding Regional Co-location Patterns for Sets of Continuous Variables, under review
"... This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-loc ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without the need for discretization and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of colocation patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes chemical concentrations in Texas water wells centering on colocation patterns involving Arsenic. Our approach was able to identify known and unknown regional co-location patterns, and different sets of algorithm parameters lead to the characterization of arsenic distribution at different scales. Moreover, inconsistent co-location sets were found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes. Keywords spatial data mining, regional co-location mining, regional data mining, clustering, finding associations between continuous variables. 1.
An Association Analysis Approach to Biclustering
"... The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In
Mining Bi-sets in Numerical Data
"... Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers who ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bi-sets, i.e., couples of associated sets of objects and attributes which satisfy some user-defined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the so-called numerical bi-sets. Preliminary experimental validation is given. 1
Abstract Approximating Representations for Large Numerical Databases
"... The paper introduces a notion of support for realvalued functions. It is shown how to approximate supports of a large class of functions based on supports of so called polynomial itemsets, which can efficiently be mined using an Apriori-style algorithm. An upper bound for the error of such an approx ..."
Abstract
- Add to MetaCart
The paper introduces a notion of support for realvalued functions. It is shown how to approximate supports of a large class of functions based on supports of so called polynomial itemsets, which can efficiently be mined using an Apriori-style algorithm. An upper bound for the error of such an approximation can be reliably computed. The concept of an approximating representation was introduced, which extends the idea of concise representations to numerical data. It has been shown that many standard statistical modelling tasks such as nonlinear regression and least squares curve fitting can efficiently be solved using only the approximating representation, without accessing the original data at all. Since many of those methods traditionally require several passes over the data, our approach makes it possible to use such methods with huge datasets and data streams where several repeated scans are very costly or outright impossible. 1 Introduction and
Approximating Representations for Large Numerical Databases
"... The paper introduces a notion of support for realvalued functions. It is shown how to approximate supports of a large class of functions based on supports of so called polynomial itemsets, which can efficiently be mined using an Apriori-style algorithm. An upper bound for the error of such an approx ..."
Abstract
- Add to MetaCart
The paper introduces a notion of support for realvalued functions. It is shown how to approximate supports of a large class of functions based on supports of so called polynomial itemsets, which can efficiently be mined using an Apriori-style algorithm. An upper bound for the error of such an approximation can be reliably computed. The concept of an approximating representation was introduced, which extends the idea of concise representations to numerical data. It has been shown that many standard statistical modelling tasks such as nonlinear regression and least squares curve fitting can efficiently be solved using only the approximating representation, without accessing the original data at all. Since many of those methods traditionally require several passes over the data, our approach makes it possible to use such methods with huge datasets and data streams where several repeated scans are very costly or outright impossible. 1 Introduction and

