We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Experiments with synthetic as well as real-life data show that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database. 1 Introduction Database mining is motivated by the decision support problem faced by most large retail organizations [S 93]. Progress in bar-code technology has made it possible for retail organiz...
|
4701
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
3356
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2573
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1521
|
Mining association rules between sets of items in large databases
– Agrawal, Imielinski, et al.
- 1993
|
|
866
|
Inductive Logic Programming
– Muggleton
- 1991
|
|
726
|
A bayesian method for the induction of probabilistic networks from data
– Cooper, Herskovits
- 1992
|
|
523
|
Knowledge Acquisition via Incremental Concept Formation
– Fisher
- 1987
|
|
395
|
Fast algorithms for mining association rules in large databases
– Agrawal, Srikant
- 1994
|
|
310
|
Efficient similarity search in sequence databases
– Agrawal, Faloutsos, et al.
- 1993
|
|
215
|
Database Mining: A Performance Perspective
– Agrawal, Imielinski, et al.
- 1993
|
|
154
|
Efficient algorithms for discovering association rules
– Mannila, Toivonen, et al.
- 1994
|
|
143
|
Classi cation and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
128
|
Knowledge discovery in databases: An attribute-oriented approach
– Han, Cai, et al.
- 1992
|
|
100
|
An interval classifier for database mining applications
– Agrawal, Ghosh, et al.
- 1992
|
|
81
|
Scientific discovery: Computational explorations of the creative processes
– Langley, Simon, et al.
- 1987
|
|
76
|
Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases
– Agrawal
- 1993
|
|
65
|
Set-oriented mining of association rules
– Houtsma, Swami
- 1993
|
|
56
|
Megainduction : A test flight
– Catlett
- 1991
|
|
53
|
Data mining: The search for knowledge in databases
– Holsheimer, Siebes
- 1994
|
|
40
|
Verkamo, “Efficient Algorithms for Discovering Association Rules
– Mannila, Toivonen, et al.
- 1994
|
|
37
|
Knowledge mining by imprecise querying: a classification-based approach
– Anwar, Beck, et al.
- 1992
|
|
32
|
Skicat: A machine learning system for automated cataloging of large scale sky surveys
– Fayyad, Weir, et al.
- 1993
|
|
28
|
E cient induction of logic programs
– Muggleton, Feng
- 1990
|
|
26
|
Dependency inference
– Mannila, Raiha
- 1987
|
|
17
|
Imielinski and Arun Swami. Database mining: A performance perspective
– Agrawal, Thomas
- 1993
|
|
15
|
Data dredging
– Tsur
- 1990
|
|
13
|
Mining for knowledge in databases: The INLEN architecture, initial implementation and first results
– Michalski, Kerschberg, et al.
- 1992
|
|
10
|
et al. Autoclass: A Bayesian classification system
– Cheeseman
- 1990
|
|
10
|
Practitioner problems in need of database research: Research directions in knowledge discovery
– Krishnamurthy, Imielinski
- 1991
|
|
7
|
The new direct marketing. Business One
– Associates
- 1990
|
|
7
|
et al. Integrated support for data archeology
– Brachman
- 1993
|
|
7
|
Houtsma and Arun Swami. Set-oriented mining of association rules
– Maurice
- 1995
|
|
6
|
Discovery, analysis, and presentation of strong rules
– Piatestsky-Shapiro
- 1991
|
|
6
|
Discovery from databases: A review of AI and statistical techniques
– Lubinsky
- 1989
|
|
6
|
Arun Swami. An interval classi er for database mining applications
– Agrawal, Ghosh, et al.
- 1992
|
|
6
|
Megainduction: A test ight
– Catlett
- 1991
|
|
5
|
et al. AutoClass: A bayesian classi cation system
– Cheeseman
- 1988
|
|
5
|
Inkeri Verkamo. E cient algorithms for discovering association rules
– Mannila, Toivonen, et al.
- 1994
|
|
4
|
Knowledge Discovery in Databases
– Piatestsky-Shapiro, editor
- 1991
|
|
3
|
Bridging the gap between database theory and practice
– Bitton
- 1992
|
|
2
|
Knowledge mining in databases: A unified approach through conceptual clustering
– Anwar, Navathe, et al.
- 1992
|
|
2
|
Associates. The new direct marketing. Business One
– Shepard
- 1990
|
|
2
|
and Tomasz Imielinski. Practitioner problems in need of database research: Research directions in knowledge discovery
– Krishnamurthy
- 1991
|
|
2
|
Domain-Independent Function Finding
– Schaffer
- 1990
|
|
2
|
Shamkant B.Navathe. Knowledge mining by imprecise querying: A classi cation-based approach
– Anwar, Beck
- 1992
|
|
2
|
et al. The DBMS research at crossroads
– Stonebraker
- 1993
|
|
1
|
Learning logical definitions from examples
– Quinlan
- 1990
|
|
1
|
Knowledge mining in databases: A uni ed approach through conceptual clustering
– Anwar, Navathe, et al.
|
|
1
|
Learning logical de nitions from examples
– Quinlan
- 1990
|
|
1
|
Domain-Independent Function Finding
– er
- 1990
|