Results 1 - 10
of
11
Interestingness of Frequent Itemsets Using Bayesian Networks as Background Knowledge
- In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining
, 2004
"... ..."
Knowledge discovery with genetic programming for providing feedback to courseware author. User Modeling and User-adapted Interaction: The
- Journal of Personalization Research
"... Abstract. We introduce a methodology to improve Adaptive Systems for Web-Based Education. This methodology uses evolutionary algorithms as a data mining method for discovering interesting relationships in students ’ usage data. Such knowledge may be very useful for teachers and course authors to sel ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
Abstract. We introduce a methodology to improve Adaptive Systems for Web-Based Education. This methodology uses evolutionary algorithms as a data mining method for discovering interesting relationships in students ’ usage data. Such knowledge may be very useful for teachers and course authors to select the most appropriate modifications to improve the effectiveness of the course. We use Grammar-Based Genetic Programming (GBGP) with multi-objective optimization techniques to discover prediction rules. We present a specific data mining tool that can help non-experts in data mining carry out the complete rule discovery process, and demonstrate its utility by applying it to an adaptive Linux course that we developed. Key words. adaptive system for web-based education, data mining, evolutionary algorithms, grammar-based genetic programming, prediction rules
Association Rules Mining: A Recent Overview
"... Abstract. In this paper, we provide the preliminaries of basic concepts about association rule mining and survey the list of existing association rule mining techniques. Of course, a single article cannot be a complete review of all the algorithms, yet we hope that the references cited will cover th ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. In this paper, we provide the preliminaries of basic concepts about association rule mining and survey the list of existing association rule mining techniques. Of course, a single article cannot be a complete review of all the algorithms, yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions that have yet to be explored. 1
On Local Pruning of Association Rules Using Directed Hypergraphs
- ICDE’04 – Proc. of the 20th International Conference on Data Engeneering, IEEE
, 2003
"... In this paper we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs. The hypergraph which represents the association rules ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs. The hypergraph which represents the association rules is called an Association Rules Network(ARN). We propose two operations on this network for pruning rules, prove several properties of the ARN and apply the results of our approach to two popular data sets. Keywords: Association Rules Pruning, Interestingess, Directed Hypergraphs 1
Finding Trees From Unordered 0--1 Data
- In PKDD (2006
"... Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0--1 data and consider the problem of discovering frequently occurring members of this pattern class. Intuitively, a tree T occurs in a r ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0--1 data and consider the problem of discovering frequently occurring members of this pattern class. Intuitively, a tree T occurs in a row u of the data, if the attributes of T that occur in u form a subtree of T containing the root. We show that this definition has advantageous properties: only shallow trees have a significant probability of occurring in random data, and the definition allows a simple levelwise algorithm for mining all frequently occurring trees. We demonstrate with empirical results that the method is feasible and that it discovers interesting trees in real data.
Information-Theoretical and Combinatorical Methods in Data-Mining
, 2003
"... INFORMATION-THEORETICAL AND COMBINATORIAL METHODS IN DATA-MINING December 2003 Szymon Jaroszewicz, M.Sc., Technical University of Szczecin Ph.D., University of Massachusetts Boston Directed by Professor Dan A. Simovici Various applications of information theoretical and combinatorial methods ..."
Abstract
- Add to MetaCart
INFORMATION-THEORETICAL AND COMBINATORIAL METHODS IN DATA-MINING December 2003 Szymon Jaroszewicz, M.Sc., Technical University of Szczecin Ph.D., University of Massachusetts Boston Directed by Professor Dan A. Simovici Various applications of information theoretical and combinatorial methods in data mining are presented.
Formal and Computational Properties of the Confidence Boost in Association Rules ∗
, 2010
"... Confidence is a very natural notion to prune and rank the output of an association rule mining algorithm; however, it is well-known that merely imposing absolute confidence and support thresholds leads to certain shortcomings. Many proposals have been suggested as attempts to overcome these shortcom ..."
Abstract
- Add to MetaCart
Confidence is a very natural notion to prune and rank the output of an association rule mining algorithm; however, it is well-known that merely imposing absolute confidence and support thresholds leads to certain shortcomings. Many proposals have been suggested as attempts to overcome these shortcomings. Here we propose a different alternative: to complement the association rule mining process by filtering also the obtained rules according to their novelty, measured in a relative way with respect to the confidences of related rules. Our proposal, the confidence boost of a rule, encompasses two previous similar notions (confidence width and rule blocking). We analyze the properties of this notion, obtain a reasonably efficient algorithm to filter rules according to their confidence boost, compare it to some similar notions in the bibliography, and describe the results of some experimentation employing the new notion on standard benchmark datasets. 1
Member
, 2003
"... Various applications of information theoretical and combinatorial methods in data mining are presented. An axiomatization has been introduced for a family of entropies including both Shannon entropy and the Gini index as special cases. These entropies, and distances based on them, were then applied ..."
Abstract
- Add to MetaCart
Various applications of information theoretical and combinatorial methods in data mining are presented. An axiomatization has been introduced for a family of entropies including both Shannon entropy and the Gini index as special cases. These entropies, and distances based on them, were then applied to decision tree construction. It has been shown experimentally that trees using distances based on generalized entropies as splitting criteria are smaller than those constructed using other criteria without significant loss in accuracy. One of the major problems in association rule mining is the huge number of rules produced. This work contains contributions to two principal methods of addressing the problem: sorting rules based on some interestingness measure, and rule pruning. A new measure of rule interestingness is introduced generalizing three well-known measures: chi-squared, entropy gain and Gini gain, which moreover gives a whole family of intermediate measures with interesting properties. Also, iv a method of pruning association rules using the Maximum Entropy Principle has
Mining Low-Support Discriminative Patterns from Dense and High-dimensional Data
, 2010
"... Discriminative patterns can provide valuable insights into datasets with class labels, that may not be available from the individual features or the predictive models built using them. Most existing approaches work efficiently for sparse or low-dimensional datasets. However, for dense and highdimens ..."
Abstract
- Add to MetaCart
Discriminative patterns can provide valuable insights into datasets with class labels, that may not be available from the individual features or the predictive models built using them. Most existing approaches work efficiently for sparse or low-dimensional datasets. However, for dense and highdimensional datasets, they have to use high thresholds to produce the complete results within limited time, and thus, may miss interesting low-support patterns. In this paper, we address the necessity of trading off the completeness of discriminative pattern discovery with the efficient discovery of lowsupport discriminative patterns from such datasets. We propose a family of anti-monotonic measures named SupMaxK that organize the set of discriminative patterns into nested layers of subsets, which are progressively more complete in their coverage, but require increasingly more computation. In particular, the member of SupMaxK with K = 2, named SupMaxPair, is suitable for dense and high-dimensional datasets. Experiments on both synthetic datasets and a cancer gene expression dataset demonstrate that there are low-support patterns that can be discovered using SupMaxPair but not by existing approaches. Furthermore, we show that the low-support discriminative patterns that are only discovered

