Results 1 - 10
of
10
A tree projection algorithm for generation of frequent itemsets
- Journal of Parallel and Distributed Computing
, 2000
"... In this paper we propose algorithms for generation of frequent itemsets by successive construction of the nodes of a lexicographic tree of itemsets. We discuss di erent strategies in generation and traversal of the lexicographic tree such as breadth- rst search, depth- rst search or a combination of ..."
Abstract
-
Cited by 123 (0 self)
- Add to MetaCart
In this paper we propose algorithms for generation of frequent itemsets by successive construction of the nodes of a lexicographic tree of itemsets. We discuss di erent strategies in generation and traversal of the lexicographic tree such as breadth- rst search, depth- rst search or a combination of the two. These techniques provide di erent trade-o s in terms of the I/O, memory and computational time requirements. We use the hierarchical structure of the lexicographic tree to successively project transactions at each node of the lexicographic tree, and use matrix counting on this reduced set of transactions for nding frequent itemsets. We tested our algorithm on both real and synthetic data. We provide an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature. The algorithm has a well structured data access pattern which provides data locality and reuse of data for multiple levels of the cache. We also discuss methods for parallelization of the
Cyclic Association Rules
- In Proc. 1998 Int. Conf. Data Engineering (ICDE'98
, 1998
"... We study the problem of discovering association rules that display regular cyclic variation over time. For example, if we compute association rules over monthly sales data, we may observe seasonal variation where certain rules are true at approximately the same month each year. Similarly, associatio ..."
Abstract
-
Cited by 72 (1 self)
- Add to MetaCart
We study the problem of discovering association rules that display regular cyclic variation over time. For example, if we compute association rules over monthly sales data, we may observe seasonal variation where certain rules are true at approximately the same month each year. Similarly, association rules can also display regular hourly, daily, weekly, etc., variation that is cyclical in nature. We demonstrate that existing methods cannot be naively extended to solve this problem of cyclic association rules. We then present two new algorithms for discovering such rules. The first one, which we call the sequential algorithm, treats association rules and cycles more or less independently. By studying the interaction between association rules and time, we devise a new technique called cycle pruning, which reduces the amount of time needed to find cyclic association rules. The second algorithm, which we call the interleaved algorithm, uses cycle pruning and other optimization techniques f...
On the discovery of interesting patterns in association rules
- Proc. 24th Intl. Conf. on Very Large Data Bases (VLDB
, 1998
"... Many decision support systems, which utilize associ-ation rules for discovering interesting patterns, require the discovery of association rules that vary over time. Such rules describe complicated temporal patterns such as events that occur on the “first working day of every month. ” In this paper, ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Many decision support systems, which utilize associ-ation rules for discovering interesting patterns, require the discovery of association rules that vary over time. Such rules describe complicated temporal patterns such as events that occur on the “first working day of every month. ” In this paper, we study the problem of discov-ering how association rules vary over time. In particu-lar, we introduce the idea of using a calendar algebra to describe complicated temporal phenomena of interest to the user. We then present algorithms for discovering culendric association rules, which are association rules that follow the patterns set forth in the user supplied cal-endar expressions. We devise various optimizations that speed up the discovery of calendric association rules. We show, through an extensive series of experiments, that these optimization techniques provide performance benefits ranging from 5 % to 250 % over a less sophisti-cated algorithm. 1
A Systematic Approach to the Assessment of Fuzzy Association Rules
"... In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such ru ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as ad-hoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives.
DATA MINING FOR INTRUSION DETECTION -- A Critical Review
"... Data mining techniques have been successfully applied in many di#erent fields including marketing, manufacturing, process control, fraud detection, and network management. Over the past five years, a growing number of research projects have applied data mining to various problems in intrusion detect ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Data mining techniques have been successfully applied in many di#erent fields including marketing, manufacturing, process control, fraud detection, and network management. Over the past five years, a growing number of research projects have applied data mining to various problems in intrusion detection. This chapter surveys a representative cross section of these research e#orts. Moreover, four characteristics of contemporary research are identified and discussed in a critical manner. Conclusions are drawn and directions for future research are suggested. Note: This article is an excerpt of the original work published in D. Barbara and S. Jajodia, editors, Applications of Data Mining in Computer Security, Kluwer Academic Publisher, Boston, 2002.
Mining Large Itemsets for Association Rules
- Bulletin of the IEEE Computer Society Technical Comittee on Data Engineering
, 1998
"... This paper provides a survey of the itemset method for association rule generation. The paper discusses past research on the topic and also studies the relevance and importance of the itemset method in generating association rules. We discuss a number of variations of the association rule problem wh ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper provides a survey of the itemset method for association rule generation. The paper discusses past research on the topic and also studies the relevance and importance of the itemset method in generating association rules. We discuss a number of variations of the association rule problem which have been proposed in the literature and their practical applications. Some inherent weaknesses of the large itemset method for association rule generation have been explored. We also discuss some other formulations of associations which can be viable alternatives to the traditional association rule generation method. 1 Introduction Association rules find the relationships between the different items in a database of sales transactions. Such rules track the buying patterns in consumer behavior eg. finding how the presence of one item in the transaction affects the presence of another and so forth. The problem of association rule generation has recently gained considerable prominence in ...
Mining Generalized Term Associations: Count Propagation Algorithm
, 1997
"... We present here an approach and algorithm for mining generalized term associations. The problem is to find co-occurrence frequencies of terms, given a collection of documents each with relevant terms, and a taxonomy of terms. We have developed an efficient Count Propagation Algorithm (CPA) targeted ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present here an approach and algorithm for mining generalized term associations. The problem is to find co-occurrence frequencies of terms, given a collection of documents each with relevant terms, and a taxonomy of terms. We have developed an efficient Count Propagation Algorithm (CPA) targeted for library applications such as Medline. The basis of our approach is that sets of terms (termsets) can be put into a taxonomy. By exploring this taxonomy, CPA propagates the count of termsets to their ancestors in the taxonomy, instead of separately counting individual termset. We found that CPA is more efficient than other algorithms, particularly for counting large termsets. A benchmark on data sets extracted from a Medline database showed that CPA outperforms other known algorithms by up to around 200% (half the computing time) at the cost of less than 20% of additional memory to keep the taxonomy of termsets. We have used discovered knowledge of term associations for the purpose of imp...
On Pruning Strategies for Discovery of Generalized and Quantitative Association Rules
"... . Mining association rules has become an important datamining task, and meanwhile many algorithms have been developed which often differ in several aspects. In this paper, we analyse and compare the pruning strategies of several algorithms that were designed for mining generalised and quantitative a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. Mining association rules has become an important datamining task, and meanwhile many algorithms have been developed which often differ in several aspects. In this paper, we analyse and compare the pruning strategies of several algorithms that were designed for mining generalised and quantitative association rules while abstracting from other technical details. Furthermore, we sketch a novel pruning strategy Genex that exploits all information provided by a taxonomy for pruning and is applicable with a "horizontal" database layout. In the context of mining quantitative association rules, we suggest a novel representation for intervals in terms of interval boundaries. 1 Introduction Mining association rules has become an important datamining task where much research has been conducted and, accordingly, many algorithms have been developed. When first introduced, association rules were restricted to Boolean databases. In the sequel, the original task has been extended in several directi...
Incremental Update on Sequential Patterns in Large Databases
- In Proceedings of the Tools for Articial Intelligence Conference (TAI'98
, 1998
"... Mining of sequential patterns in a transactional database is time-consuming due to its complexity. While maintaining present patterns is a non-trivial task after database update, since appended data sequences may invalidate old patterns and create new ones. In contrast to re-mining, the incremental ..."
Abstract
- Add to MetaCart
Mining of sequential patterns in a transactional database is time-consuming due to its complexity. While maintaining present patterns is a non-trivial task after database update, since appended data sequences may invalidate old patterns and create new ones. In contrast to re-mining, the incremental update algorithm proposed which effectively utilizes discovered knowledge is the key to improve mining performance. By counting over appended data sequences instead of entire updated database in most cases, fast filtering patterns found in last mining and successive candidate sequence reductions together make efficient update on sequential patterns possible.
Discovery of Hidden Relationship in a Large Data Itemsets through Apriori Algorithm of Association Analysis with UML Narander Kumar Department of Computer Science,
"... An association rule is a method to find out the frequent hidden relationship from a large amount of datasets in a database. Association analysis into existing database technology is very useful for indexing and query processing capabilities of database system and developing efficient and scalable mi ..."
Abstract
- Add to MetaCart
An association rule is a method to find out the frequent hidden relationship from a large amount of datasets in a database. Association analysis into existing database technology is very useful for indexing and query processing capabilities of database system and developing efficient and scalable mining algorithms as well as handling user specified or domain specific constraints and post processing the extracted patterns. In the present work, a methodology known as association analysis is presented which is very useful for discovery of interesting relationship hidden in large dataset, and an algorithm for generation of frequent data item set known as Apriori algorithm is used and validated the relations through Unified Modeling Language (UML). Authors used the lattice structure and also discussed the various association rules for the frequent data itemset which is found by Apriori algorithm. The different strategies in generation and traversal are breadth first and depth first search traversal. These techniques provide different tradeoff in terms of the input and output memory and computational time requirements. The entire concept is implemented by considering a real case study of Vehicle Insurance Policy system (VIPS) in context of Indian scenario.

