## Efficiently mining long patterns from databases (1998)

### Cached

### Download Links

- [www.cs.tau.ac.il]
- [www.bayardo.org]
- [www.almaden.ibm.com]
- [cs-people.bu.edu]
- [www.almaden.ibm.com]
- [cs.sungshin.ac.kr]
- DBLP

### Other Repositories/Bibliography

Citations: | 402 - 3 self |

### BibTeX

@INPROCEEDINGS{Bayardo98efficientlymining,

author = {Roberto J. Bayardo},

title = {Efficiently mining long patterns from databases},

booktitle = {},

year = {1998},

pages = {85--93}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnimaximal frequent itemset, Max-Miner’s output implicitly and concisely represents all frequent itemsets. Max-Miner is shown to result in two or more orders of magnitude in performance improvements over Apriori on some data-sets. On other data-sets where the patterns are not so long, the gains are more modest. In practice, Max-Miner is demonstrated to run in time that is roughly linear in the number of maximal frequent itemsets and the size of the database, irrespective of the size of the longest frequent itemset. tude or more. 1.

### Citations

2860 | Fast algorithms for mining association rules
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...ransversal algorithms can be used as a component of an algorithm for mining maximal frequent itemsets. 3.3 Implementation Details Max-Miner can use the same data-structures as Apriori (as detailed in =-=[3]-=-) for efficiently computing itemset supports. The primary data-structure used by Apriori is the hash tree to index candidate itemsets. Max-Miner uses the hash tree to index only the head of each candi... |

2603 | Mining Association Rules between Sets of Items in Large Databases
- Agrawal, Imielinski, et al.
- 1993
(Show Context)
Citation Context ...f the size of the longest frequent itemset. tude or more. 1. Introduction Finding patterns in databases is the fundamental operation behind several common data-mining tasks including association rule =-=[l]-=- and sequential pattern mining [4]. For the most part, pattern mining algorithms have been developed to operate on databases where the longest patterns are relatively short. This leaves data outside t... |

1239 | Mining sequential patterns
- Agrawal, Srikant
- 1995
(Show Context)
Citation Context ... itemset. tude or more. 1. Introduction Finding patterns in databases is the fundamental operation behind several common data-mining tasks including association rule [l] and sequential pattern mining =-=[4]-=-. For the most part, pattern mining algorithms have been developed to operate on databases where the longest patterns are relatively short. This leaves data outside this mold unexplorable using curren... |

588 | Mining sequential patterns: Generalizations and performance improvements - Srikant, Agrawal - 1996 |

515 | Dynamic itemset counting and implication rules for market basket data
- Brin, Motwani, et al.
- 1997
(Show Context)
Citation Context ...ry recently-proposed pattern-mining algorithm is a variant of Apriori [2]. Two recent papers have demonstrated that Apriori-like algorithms are inadequate on data-sets with long patterns. Brin et al. =-=[6]-=- applied their association-rule miner DIC to a data-set composed of PUMS census records. To reduce the difftculty of this data-set, they removed all items appearing in over 80% of the transactions yet... |

504 |
Fast discovery of association rules
- Agrawal, Mannila, et al.
- 1996
(Show Context)
Citation Context ... tend to have long patterns because they contain many frequently occurring items and have a wide average record length. Almost every recently-proposed pattern-mining algorithm is a variant of Apriori =-=[2]-=-. Two recent papers have demonstrated that Apriori-like algorithms are inadequate on data-sets with long patterns. Brin et al. [6] applied their association-rule miner DIC to a data-set composed of PU... |

393 | An efficient algorithm for mining association rules in large databases
- Savasere, Omiecinski, et al.
- 1995
(Show Context)
Citation Context ...ing database pass l. DIC [6] is more eager and begins checking an itemset shortly after all its subsets have been determined frequent, rather than waiting until the database pass completes. Partition =-=[11]-=- identifies all frequent-itemsets in memory-sized partitions of the database, and then checks these against the entire database during a final pass. DIC considers the same number of candidate itemsets... |

336 | New algorithms for fast discovery of association rules
- Zaki, Parthasarathy, et al.
- 1997
(Show Context)
Citation Context ...ata-sets since each attempt at extending an itemset requires a scan over the data. It also remains to be seen how the proposed complete version of the algorithm would perform inspractice. Zaki et al. =-=[16]-=- present the algorithms MaxEclat and MaxClique for identifying maximal frequent itemsets. These algorithms are similar to Max-Miner in that they also attempt to look ahead and identify long frequent i... |

245 | Mining association rules with item constraints
- Srikant, Vu, et al.
- 1997
(Show Context)
Citation Context ...ociation rule confidence is a constraint that Bayardo [5] uses to prune some itemsets from consideration. Other constraints that have been used during the search for patterns include item constraints =-=[15]-=- and information-theoretic constraints [13]. Interestingness constraints thus far applied only during postprocessing (e.g. [6]) might also be exploitable during search to improve efficiency. Max-Miner... |

218 |
An efficient hash based algorithm for mining association rules
- Park, Chen, et al.
(Show Context)
Citation Context ... number of candidate itemsets as Apriori, and Partition can consider more but never fewer candidate itemsets than Apriori, potentially exacerbating problems associated with long patterns. Park et al. =-=[9]-=- enhance Apriori with a hashing scheme that can identify (and thereby eliminate from consideration) some candidates that will turn up infrequent if checked against the database. It also uses the hashi... |

116 |
Search through systematic set enumeration
- Rymon
- 1992
(Show Context)
Citation Context ...e specified as a percentage of the transactions in the data-set instead of as an absolute number of transactions. Max-Miner can be described using Rymon’s generic setenumeration tree search framework =-=[10]-=-. The idea is to expand sets over an ordered and finite item domain as illustrated in Figure 1 where four items are denoted by their position in the ordering. The particular ordering imposed on the it... |

105 | A new algorithm for discovering the maximum frequent set
- Lin, Kedem
- 1998
(Show Context)
Citation Context ...s a dynamic programming algorithm for finding maximal cliques in a graph whose largest clique is at least as large as the length of the longest frequent itemset. Concurrent to our work, Lin and Kedem =-=[8]-=- have proposed an algorithm called Pincer-Search for mining long maximal frequent itemsets. Like Max-Miner, Pincer-Search attempts to identify long patterns throughout the search. The difference betwe... |

94 |
An information theoretic approach to rule inductionfrom databases
- Smyth
- 1992
(Show Context)
Citation Context ...at Bayardo [5] uses to prune some itemsets from consideration. Other constraints that have been used during the search for patterns include item constraints [15] and information-theoretic constraints =-=[13]-=-. Interestingness constraints thus far applied only during postprocessing (e.g. [6]) might also be exploitable during search to improve efficiency. Max-Miner provides a framework in which additional c... |

56 | Discovering All Most Specific Sentences by Randomized Algorithms
- Gunopulos, Mannila, et al.
- 1997
(Show Context)
Citation Context ...se, to republish, to port on servers or to redistribute to lirts. requiroe prior specific permieoion and/or a fee. SIGMOD ‘98 Seattle. WA, USA Q 1996 ACM 0-89791-995-5/98/006...85.00 Gunopulos et al. =-=[7]-=- present a randomized algorithm for identifying maximal frequent itemsets in memory-resident databases. Their algorithm works by iteratively attempting to extend a working pattern until failure. A ran... |

47 | Brute-force mining of high confidence classification rules
- Bayardo
- 1997
(Show Context)
Citation Context ... composed of PUMS census records. To reduce the difftculty of this data-set, they removed all items appearing in over 80% of the transactions yet still could only mine efficiently at high support. We =-=[5]-=- previously applied an Apriori-inspired algorithm to several data-sets from the Irvine Machine Learning Database Repository. In order to mine efftciently, this algorithm had to sometimes apply pruning... |

12 |
R.C.T.: A new algorithm for generating prime implicants
- Chang
- 1970
(Show Context)
Citation Context ...his strategy tunes the frequency heuristic by having it consider only the subset of transactions relevant to the given node. Interestingly, the same item reordering heuristic is used by Slagel et al. =-=[12]-=- in a set-enumeration algorithm for identifying prime implicants in CNF propositional logic expressions. The fact that the same policy works well for both problems is likely due to their close relatio... |

1 |
Fast Discoverv of Association Rules
- I
- 1996
(Show Context)
Citation Context ... tend to have long patterns because they contain many frequently occurring items and have a wide average record length. Almost every recently-proposed pattern-mining algorithm is a variant of Apriori =-=[2]-=-. Two recent papers have demonstrated that Apriori-like algorithms are inadequate on data-sets with long patterns. Brin et al. [6] applied their association-rule miner DIC to a data-set composed of PU... |