Results 1 -
3 of
3
Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data
, 2002
"... Data mining can extract important knowledge from large data collections -- but sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data, and some types of information about the data. This paper addresses secure mining of ass ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
Data mining can extract important knowledge from large data collections -- but sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data, and some types of information about the data. This paper addresses secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task.
Privacy-preserving distributed mining of association rules on horizontally partitioned data
- In The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02
"... Abstract—Data mining can extract important knowledge from large data collections—but sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. This paper addresses secure mining o ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Abstract—Data mining can extract important knowledge from large data collections—but sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. This paper addresses secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task. Index Terms—Data mining, security, privacy. æ
Privacy-Enhancing Distributed Higher-Order ARM
"... Traditional association rule mining algorithms assume that data instances are independent and identically distributed (IID) [29]. In statistical relational learning (SRL), however, relationships between instances can be leveraged to improve performance of learning algorithms [2]. Higher-order associ ..."
Abstract
- Add to MetaCart
Traditional association rule mining algorithms assume that data instances are independent and identically distributed (IID) [29]. In statistical relational learning (SRL), however, relationships between instances can be leveraged to improve performance of learning algorithms [2]. Higher-order association rule mining is an example of a SRL approach that does not make the IID assumption, but instead discovers itemsets that cross record boundaries [21]. Empirical analysis shows that higher-order methods perform especially well on small datasets as they are able to capture the variability of the underlying data distribution more readily than traditional methods [11]. In a distributed environment, however the discovery of higher-order itemsets reveals significant information about the nature of disparate data sources [21]. Preserving privacy in a setting in which data instances are treated as nodes in a graph rather than independent entities is an open problem in privacy research that has only recently received attention in the data mining community [24]. In this paper we propose a novel privacy-enhancing distributed higher-order ARM algorithm, PE-DiHO ARM. PE-DiHO ARM discovers itemsets from distributed data with a hybrid (non-horizontal, non-vertical) distribution while significantly limiting the amount of private data that is revealed. To demonstrate the validity of the approach we compare it to a non-privacy enhancing higher-order ARM algorithm [21] in an evaluation framework based on supervised learning [23]. Experimental results confirm that privacy can be significantly enhanced during the computation of higherorder itemsets in a distributed environment without significantly impacting performance. In future work we plan to apply these techniques to data provided by our law enforcement partners. 1

