Results 1  10
of
20
An Association Analysis Approach to Biclustering
"... The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on realvalued data sets in various domains, such as biology. Several algorithms have been proposed to find different ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on realvalued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to realvalued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In
Finding Regional Colocation Patterns for Sets of Continuous Variables, under review
"... This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are colocated. A coloc ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are colocated. A colocation mining framework is introduced that operates in the continuous domain without the need for discretization and which views regional colocation mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of colocation patterns is assessed using products of zscores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes chemical concentrations in Texas water wells centering on colocation patterns involving Arsenic. Our approach was able to identify known and unknown regional colocation patterns, and different sets of algorithm parameters lead to the characterization of arsenic distribution at different scales. Moreover, inconsistent colocation sets were found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional colocation mining techniques. Moreover, a novel, prototypebased region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes. Keywords spatial data mining, regional colocation mining, regional data mining, clustering, finding associations between continuous variables. 1.
Discovery of errortolerant biclusters from noisy gene expression data
 Bioinformatics
, 2011
"... An important analysis performed on microarray geneexpression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these realvalued geneex ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
An important analysis performed on microarray geneexpression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these realvalued geneexpression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their topdown approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which
Mining Bisets in Numerical Data
"... Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers who ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bisets, i.e., couples of associated sets of objects and attributes which satisfy some userdefined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the socalled numerical bisets. Preliminary experimental validation is given. 1
A novel errortolerant frequent itemset model for binary and realvalued data
, 2009
"... Frequent pattern mining has been successfully applied to a broad range of applications, however, it has two major drawbacks, which limits its applicability to several domains. First, as the traditional ‘exact ’ model of frequent pattern mining uses a strict definition of support, it limits the recov ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Frequent pattern mining has been successfully applied to a broad range of applications, however, it has two major drawbacks, which limits its applicability to several domains. First, as the traditional ‘exact ’ model of frequent pattern mining uses a strict definition of support, it limits the recovery of frequent itemset patterns in reallife data sets where the patterns may be fragmented due to random noise/errors. Second, as traditional frequent pattern mining algorithms works with only binary or boolean attributes, it requires transformation of realvalued attributes to binary attributes, which often results in loss of information. As many of the reallife data sets are both noisy and realvalued in nature, past approaches have tried to independently address these issues and there is no systematic approach that addresses both of these issues together. In this paper, we propose a novel ErrorTolerant Frequent Itemset (ETFI) model for binary as well as realvalued data. We also propose a bottomup pattern mining algorithm to sequentially discover all ETFIs from both types of data sets. To illustrate the efficacy of our proposed ETFI approach, we use two realvalued S.Cerevisiae microarray geneexpression data sets and evaluate the patterns obtained in terms of their functional coherence as evaluated using the GObased functional enrichment analysis. Our results clearly demonstrate the importance of directly accounting for errors/noise in the data. Finally, the statistical significance of the discovered ETFIs as estimated by using two randomization tests, reveal that discovered ETFIs are indeed biologically meaningful and are neither obtained by random chance nor capture random structure in the data. The source codes as well as data sets used in this study are made available at the following website:
J.F.: Mining graph topological patterns: Finding covariations among vertex descriptors
 IEEE Transactions on Knowledge and Data Engineering
, 2013
"... Abstract—We propose to mine the graph topology of a large attributed graph by finding regularities among vertex descriptors. Such descriptors are of two types: 1) the vertex attributes that convey the information of the vertices themselves and 2) some topological properties used to describe the conn ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract—We propose to mine the graph topology of a large attributed graph by finding regularities among vertex descriptors. Such descriptors are of two types: 1) the vertex attributes that convey the information of the vertices themselves and 2) some topological properties used to describe the connectivity of the vertices. These descriptors are mostly of numerical or ordinal types and their similarity can be captured by quantifying their covariation. Mining topological patterns relies on frequent pattern mining and graph topology analysis to reveal the links that exist between the relation encoded by the graph and the vertex attributes. We propose three interestingness measures of topological patterns that differ by the pairs of vertices considered while evaluating up and down covariations between vertex descriptors. An efficient algorithm that combines search and pruning strategies to look for the most relevant topological patterns is presented. Besides a classical empirical study, we report case studies on four reallife networks showing that our approach provides valuable knowledge. Index Terms—Data mining, mining methods and analysis, attributed graph mining, topological patterns Ç 1
Abstract
"... determination of multiprocessor schedulability for sets of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
determination of multiprocessor schedulability for sets of
Coupled attribute analysis on numerical data
 In Proceedings of IJCAI2013
, 2013
"... The usual representation of quantitative data is to formalize it as an information table, which assumes the independence of attributes. In realworld data, attributes are more or less interacted and coupled via explicit or implicit relationships. Limited research has been conducted on analyzing suc ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The usual representation of quantitative data is to formalize it as an information table, which assumes the independence of attributes. In realworld data, attributes are more or less interacted and coupled via explicit or implicit relationships. Limited research has been conducted on analyzing such attribute interactions, which only describe a local picture of attribute couplings in an implicit way. This paper proposes a framework of the coupled attribute analysis to capture the global dependency of continuous attributes. Such global couplings integrate the intracoupled interaction within an attribute (i.e. the correlations between attributes and their own powers) and intercoupled interaction among different attributes (i.e. the correlations between attributes and the powers of others) to form a coupled representation for numerical objects by the Taylorlike expansion. This work makes one step forward towards explicitly addressing the global interactions of continuous attributes, verified by the applications in data structure analysis, data clustering, and data classification. Substantial experiments on 13 UCI data sets demonstrate that the coupled representation can effectively capture the global couplings of attributes and outperforms the traditional way, supported by statistical analysis. 1
Minimum Variance Associations— Discovering Relationships in Numerical Data
 In Proc. of the PacificAsia Conference on Knowledge Discovery and Data Mining
, 2008
"... Abstract. The paper presents minimum variance patterns: a new class of itemsets and rules for numerical data, which capture arbitrary continuous relationships between numerical attributes without the need for discretization. The approach is based on finding polynomials over sets of attributes whose ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The paper presents minimum variance patterns: a new class of itemsets and rules for numerical data, which capture arbitrary continuous relationships between numerical attributes without the need for discretization. The approach is based on finding polynomials over sets of attributes whose variance, in a given dataset, is close to zero. Sets of attributes for which such functions exist are considered interesting. Further, two types of rules are introduced, which help extract understandable relationships from such itemsets. Efficient algorithms for mining minimum variance patterns are presented and verified experimentally. 1 Introduction and Related Research Mining association patterns has a long tradition in datamining. Most methods, however, are designed for binary or categorical attributes. The usual approach to numerical data is discretization [22]. Discretization however leads to information loss and problems such as rules being split over several intervals. Approaches