## MARKET BASKET ANALYSIS FOR DATA MINING APPROVED BY: (2001)

### BibTeX

@MISC{Ula¸s01marketbasket,

author = {Mehmet Aydın Ula¸s and Assoc Prof and Taner Bilgiç and Prof Fikret Gürgen},

title = {MARKET BASKET ANALYSIS FOR DATA MINING APPROVED BY:},

year = {2001}

}

### OpenURL

### Abstract

I want to thank Ethem Alpaydın for helping me all the time with ideas for my thesis and for his contribution to my undergraduate and graduate education. I want to thank Fikret Gürgen and Taner Bilgiç for their contribution to my undergraduate and graduate education and for participating in my thesis jury. I want to thank Dengiz Pınar, Nasuhi Sönmez and Ataman Kalkan of Gima Türk A.S¸. for supplying me the data I used in my thesis. I want to thank my family who always supported me and never left me alone during the preperation of this thesis. iv MARKET BASKET ANALYSIS FOR DATA MINING Most of the established companies have accumulated masses of data from their customers for decades. With the e-commerce applications growing rapidly, the companies will have a significant amount of data in months not in years. Data Mining, also known as Knowledge Discovery in Databases (KDD), is to find trends, patterns, correlations, anomalies in these databases which can help us to make accurate future decisions. Mining Association Rules is one of the main application areas of Data Mining. Given a set of customer transactions on items, the aim is to find correlations between the sales of items. We consider Association Mining in large database of customer transactions. We give an overview of the problem and explain approaches that have been used to attack this problem. We then give the description of the Apriori Algorithm and show results that are taken from Gima Türk A.S¸. a large Turkish supermarket chain. We also use two statistical methods: Principal Component Analysis and k-means to detect correlations between sets of items. v

### Citations

2787 | Fast Algorithms for Mining Association Rules
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...d the itemsets contained in the frontier set. The frontier sets are created using the 1-extensions of the candidate itemsets in the current pass. Figure 1.1 shows the pseudocode for the algorithm. In =-=[2]-=-, Agrawal and Srikant define the algorihms Apriori and AprioriTid which can handle multiple items in the consequent. We will explain Aprori in detail in Section 2.1 and give real life results taken fr... |

386 | Sampling Large Databases for Association Rules
- Toivonen
- 1996
(Show Context)
Citation Context ...to obtain which are above some specified threshold are usually very small like the tip of the iceberg. The techniques are applied if the results are very small compared to the whole dataset. Toivonen =-=[4]-=- chooses a sample from the database which is smaller than the database itself and calculates the rules in this sample. Then these rules are tested on the whole database. The algorithm first chooses a ... |

254 | Parallel Mining of Association Rules
- Agrawal, Shafer
- 1996
(Show Context)
Citation Context ...l Mining algorithm is implemented on a shared memory parallel machine in [9] by David Cheung Kan. The algorithm is given in Figure 1.5. Parallel algorithms for Mining Association Rules are defined in =-=[10]-=-. In [11] Cheung et al. implement a distributed algorithm for Mining Association Rules. The algorithm is called FDM (Fast Distributed Mining of association rules). The algorithm is given in Figure 1.6... |

220 |
Mining association rules between sets of items
- Agrawal, Imielinksi, et al.
- 1993
(Show Context)
Citation Context ...the mined data to the user. Lately, mining association rules, also called market basket analysis, is one of the application areas of Data Mining. Mining Association Rules has been first introduced in =-=[1]-=-. Consider a market with a collection of huge customer transactions. An association rule is X⇒Y where X is called the antecedent and Y is the consequent. X and Y are sets of items and the rule means t... |

137 | Computing iceberg queries efficiently - Fang, Shivakumar, et al. - 1998 |

119 | Parallel and Distributed Association Mining: A Survey
- Zaki
- 1999
(Show Context)
Citation Context ...ast Distributed Mining of association rules). The algorithm is given in Figure 1.6. The algorithm is iterated distributively at each site Si until L(k) = Ø or the set of candidate sets CG(k) = Ø Zaki =-=[12]-=- makes a survey on parallel and distributed mining of association rules. Ganti et al. [13] focus on three basic problems of Data Mining. They define and give references to various algorithms for solvi... |

106 | A fast distributed algorithm for mining association rules, in
- Cheung, Han, et al.
- 1996
(Show Context)
Citation Context ...algorithm is implemented on a shared memory parallel machine in [9] by David Cheung Kan. The algorithm is given in Figure 1.5. Parallel algorithms for Mining Association Rules are defined in [10]. In =-=[11]-=- Cheung et al. implement a distributed algorithm for Mining Association Rules. The algorithm is called FDM (Fast Distributed Mining of association rules). The algorithm is given in Figure 1.6. The alg... |

75 |
An Efficient Algorithm for Mining Association Rules
- Savasere, Omiecinski, et al.
- 1995
(Show Context)
Citation Context ...δ where e(X, s) is the absolute value of the error between the frequencies of X in real database and in sample s. The algorithm is given in Figure 1.2 The database is partitioned into n partitions in =-=[5]-=- by Sarasere et al. and the local large itemsets are calculated and tested if they are global. Then the association rules are generated accordingly. The algorithm is known as Partition. First, the ori... |

50 | Mining Very Large Databases
- Ganti, Gehrke, et al.
- 1999
(Show Context)
Citation Context ...gorithm is iterated distributively at each site Si until L(k) = Ø or the set of candidate sets CG(k) = Ø Zaki [12] makes a survey on parallel and distributed mining of association rules. Ganti et al. =-=[13]-=- focus on three basic problems of Data Mining. They define and give references to various algorithms for solving problems of type market basket analysis, clustering and classification.7 them. In [14]... |

23 | R and Guntzer .U, “A new algorithm for faster mining of generalized association rules
- Hipp, Myka, et al.
(Show Context)
Citation Context ...t structure and the set of itemsets is a list of lists. The threshold is specified not by giving a probability but giving a minimum number of items that an itemset should contain. Mining. Hipp et al. =-=[8]-=- use lattices and graphs for solving the problem of Association Rule Another way of mining association rules is to use distributed and parallel algorithms. Suppose DB is a database with |D| transactio... |

22 | Mining association rules: Deriving a superior algorithm by analysing today’s approaches - HIPP, GÜNTZER, et al. - 1910 |

14 | Mining N-Most Interesting Itemsets
- Fu, Kwong, et al.
- 2000
(Show Context)
Citation Context ... database but if we have potentially large itemsets then it is easy to count them on the whole database. The pseudocode of the algorithm can be seen in Figure 1.3. Support constraints are not used in =-=[6]-=- but instead top N association rules are found. It puts a restriction only on the number of rules. At each step of this algorithm, N-most Interesting k-itemsets are chosen. Top N itemsets are chosen a... |

13 |
Algorithms for Association Rule Mining
- Hipp, Gunter, et al.
- 2000
(Show Context)
Citation Context ...[13] focus on three basic problems of Data Mining. They define and give references to various algorithms for solving problems of type market basket analysis, clustering and classification.7 them. In =-=[14]-=- Hipp et al. consider several Association Mining algorithms and compares 1.2. Outline of Thesis The rest of the thesis is organized as follows. In Section 2 we define the algorithms used in the thesis... |

7 |
Pattern Classification and Scene Analysis: John Wiley and Sons. Bibliography 148
- Duda, Hart
- 1973
(Show Context)
Citation Context ...data. Because in our application, xi have different units; grams, pieces, etc, we work with the correlation matrix R rather than the covariance matrix S. 2.3. k-Means Clustering In k-means clustering =-=[16]-=- the idea is to unsupervisedly cluster the data into k subsets. In our application this corresponds to groups of customers with the same buying behaviour. First we choose k centers v1, v2, . . . , vk,... |

6 | Calculating a New Data Mining Algorithm for Market Basket Analysis
- Hu, Chin, et al.
(Show Context)
Citation Context ... N potential k itemset(Ck , N , k) finds the N-most interesting k-itemsets. Reduce(newsupport) reduces the value of the threshold if there are not enough potential k-itemsets.6 Hu, Chin and Takeichi =-=[7]-=- use functional languages for solving this problem. The itemsets are stored in a list structure and the set of itemsets is a list of lists. The threshold is specified not by giving a probability but g... |

1 |
An Adaptive Algorithm for Mining Association Rules on Shared Memory
- Kan
- 2001
(Show Context)
Citation Context ... ≥ s × Di then X is called globally large. L denotes the globally large itemsets in DB. The task is to find L. Adaptive Parallel Mining algorithm is implemented on a shared memory parallel machine in =-=[9]-=- by David Cheung Kan. The algorithm is given in Figure 1.5. Parallel algorithms for Mining Association Rules are defined in [10]. In [11] Cheung et al. implement a distributed algorithm for Mining Ass... |

1 |
Methods of Multivariate Analysis
- Rancher
(Show Context)
Citation Context ...e chosen as the n items sold most. By looking at the correlations between xi over all customers, we find all dependencies between items. The method we use is called Principal Component Analysis (PCA) =-=[15]-=-. Suppose we have a dataset which consists of items that have n attributes. We are looking for a set17 forall itemsets c ∈ Ck { forall (k − 1) subsets s of c { if (s /∈ Lk−1) { delete c from Ck } } }... |

1 | Improved Methods for Finding Association Rules”, AAAI Workshop on Knowledge Discovery - Mannila, Toivonen, et al. - 1994 |