Results 1 -
5 of
5
Pattern discovery by residual analysis and recursive partitioning
- IEEE Transactions on Knowledge and Data Engineering
, 1999
"... AbstractÐIn this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the importa ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
AbstractÐIn this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the important information contained in the data set and are easily interpretable as simple rules, contour plots, or parallel axes plots. In addition, an informative probabilistic description of the data is automatically furnished by the discovery process. Following a theoretical formulation, experiments with real and simulated data will demonstrate the ability to discover subtle patterns amid noise, the invariance to changes of scale, cluster detection, and discovery of multidimensional patterns. It is shown that the pattern discovery method offers the advantages of easy interpretation, rapid training, and tolerance to noncentralized noise. Index TermsÐPattern discovery, residual analysis, recursive partitioning, events, contingency tables.
Attribute clustering for grouping, selection, and classification of gene expression
- IEEE/ACM Transactions on Computational Biology and Bioinformations
, 2005
"... This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection and classification. The partitioning of a relational table into attribute subgro ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion
Transparent Decision Support Using Statistical Evidence
, 2005
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii An automatically trained, statistically based, fuzzy i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii An automatically trained, statistically based, fuzzy inference system that functions as a classifier is produced. The hybrid system is designed specifically to be used as a decision support system. This hybrid system has several features which are of direct and immediate utility in the field of decision support, in-cluding a mechanism for the discovery of domain knowledge in the form of explanatory rules through the examination of training data; the evaluation of such rules using a simple probabilistic weighting mech-anism; the incorporation of input uncertainty using the vagueness abstraction of fuzzy systems; and the provision of a strong confidence measure to predict the probability of system failure. Analysis of the hybrid fuzzy system and its constituent parts allows commentary on the weighting scheme and performance of the “Pattern Discovery ” system on which it is based. Comparisons against other well known classifiers provide a benchmark of the performance of the
applications to biomolecular
, 2001
"... www.elsevier.com/locate/ins A discrete-valued clustering algorithm with ..."
Three Related Types of Multi-Value Association Patterns
"... Mining patterns involving multiple values that are significantly relevant is a difficult but very important problem that crosses many disciplines. Multi-value association patterns, which generalize sequential pattern, are sets of associated values extracted from sampling outcomes of a random N-tuple ..."
Abstract
- Add to MetaCart
Mining patterns involving multiple values that are significantly relevant is a difficult but very important problem that crosses many disciplines. Multi-value association patterns, which generalize sequential pattern, are sets of associated values extracted from sampling outcomes of a random N-tuple. Because they are value patterns from multiple variables, they are more descriptive than their corresponding variable patterns. Hence, they are also easier to interpret. Normally, they can be detected by statistical testing if the occurrence of a pattern event is significantly deviated from the expected according to a prior model or null hypothesis. In this paper, we consider three related types of multi-value association patterns including high-order pattern (HOP), consigned pattern (CP), and nested high-order pattern (NHOP). We further evaluate the nested high-order pattern and its relationships to the others using experiments. 1.

