Results 1  10
of
222
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract

Cited by 534 (7 self)
 Add to MetaCart
One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chisquared test for independence from classical statistics. This leads to a measure that is upwardclosed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Mining Generalized Association Rules
, 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract

Cited by 481 (7 self)
 Add to MetaCart
We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets isa outerwear isa clothes, we may infer a rule that "people who buy outerwear tend to buy shoes". This rule may hold even if rules that "people who buy jackets tend to buy shoes", and "people who buy clothes tend to buy shoes" do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these "extended transactions ". However, this "Basic" algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one reallife dataset). We also present a new interes...
Data Mining: An Overview from Database Perspective
 IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract

Cited by 434 (26 self)
 Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
Efficiently mining long patterns from databases
, 1998
"... We present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data ..."
Abstract

Cited by 405 (3 self)
 Add to MetaCart
We present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnimaximal frequent itemset, MaxMiner’s output implicitly and concisely represents all frequent itemsets. MaxMiner is shown to result in two or more orders of magnitude in performance improvements over Apriori on some datasets. On other datasets where the patterns are not so long, the gains are more modest. In practice, MaxMiner is demonstrated to run in time that is roughly linear in the number of maximal frequent itemsets and the size of the database, irrespective of the size of the longest frequent itemset. tude or more. 1.
Mining Quantitative Association Rules in Large Relational Tables
, 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fi ..."
Abstract

Cited by 368 (3 self)
 Add to MetaCart
We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greaterthanexpectedvalue" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a reallife dataset. 1 Introduction Data mining, also known as knowledge discovery in databases, has been recognized as a new area for database research. The problem of discove...
CHARM: An efficient algorithm for closed itemset mining
, 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract

Cited by 272 (14 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hashbased approach to remove any “nonclosed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
Parallel Mining of Association Rules
 IEEE Transactions on Knowledge and Data Engineering
, 1996
"... We consider the problem of mining association rules on a sharednothing multiprocessor. We present three algorithms that explore a spectrum of tradeoffs between computation, communication, memory usage, synchronization, and the use of problemspecific information. The best algorithm exhibits near p ..."
Abstract

Cited by 263 (3 self)
 Add to MetaCart
We consider the problem of mining association rules on a sharednothing multiprocessor. We present three algorithms that explore a spectrum of tradeoffs between computation, communication, memory usage, synchronization, and the use of problemspecific information. The best algorithm exhibits near perfect scaleup behavior, yet requires only minimal overhead compared to the current best serial algorithm.
MAFIA: A maximal frequent itemset algorithm for transactional databases
 In ICDE
, 2001
"... We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depthfirst traversal of the itemset lattice with effective pruning ..."
Abstract

Cited by 252 (3 self)
 Add to MetaCart
We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depthfirst traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five. 1
Scalable Algorithms for Association Mining
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2000
"... Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery ..."
Abstract

Cited by 193 (22 self)
 Add to MetaCart
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented, which quickly identify all the long frequent itemsets, and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining ...
Efficient data mining for path traversal patterns
 IEEE Transactions on Knowledge and Data Engineering
, 1998
"... Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed informationproviding environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we der ..."
Abstract

Cited by 167 (12 self)
 Add to MetaCart
Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed informationproviding environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, we can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, we derive algorithms to determine the frequent traversal patterns¦i.e., large reference sequences¦from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted. Index Terms—Data mining, traversal patterns, distributed information system, World Wide Web, performance analysis.