Results 1  10
of
23
Distributed Data Mining: Algorithms, Systems, and Applications
, 2002
"... This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subs ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subsequently, the architectural issues in DDM systems and future directions are discussed
Learning bayesian network structure from distributed data
 In Proceedings of the 3rd SIAM International Data Mining Conference
, 2003
"... We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learnin ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learning, sample selection, cross learning, and combination of the results. The parents of local nodes can be correctly identified in local learning. The main task of cross learning is to identify the links whose vertices are in different sites (cross links). This is done by transmitting a small subset of samples from each local site to a central site. The combination step involves removing extra links from local Bayesian networks that may be introduced during local learning due to the well known hidden variable problem. The sample selection step selects samples, based on a likelihood criterion, that are possibly evidence of cross links. The overall procedure is called collective learning. Experimental results verify that, for sparsely connected networks, the collective learning method can learn the same structure as that obtained by a centralized learning method (which simply aggregates data from all local sites into a single site). 1
Privacypreserving computation of Bayesian networks on vertically partitioned data
 IEEE Transactions on Data Knowledge Engineering
"... Abstract—Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data store ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacypreserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacypreserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacypreserving protocol to construct a Bayesian network on the parties ’ joint data. Index Terms—Data privacy, Bayesian networks, privacypreserving data mining. 1
Efficient PeertoPeer Belief Propagation ⋆
"... Abstract. In this paper, we will present an efficient approach for distributed inference. We use belief propagation’s messagepassing algorithm on top of a DHT storing a Bayesian network. Nodes in the DHT run a variant of the spring relaxation algorithm to redistribute the Bayesian network among the ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. In this paper, we will present an efficient approach for distributed inference. We use belief propagation’s messagepassing algorithm on top of a DHT storing a Bayesian network. Nodes in the DHT run a variant of the spring relaxation algorithm to redistribute the Bayesian network among them. Thereafter correlated data is stored close to each other reducing the message cost for inference. We simulated our approach in Matlab and show the message reduction and the achieved load balance for random, treeshaped, and scalefree Bayesian networks of different sizes. As possible application, we envision a distributed software knowledge base maintaining encountered software bugs under users ’ system configurations together with possible solutions for other users having similar problems. Users would not only be able to repair their system but also to foresee possible problems if they would install software updates or new applications. 1
PrivacyPreserving Bayesian Network Learning Using Post Randomization, (in preparation
, 2006
"... Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and parameter learning for the BN. The only required information from the data set is a set of sufficient statistics for learning both network structure and parameters. The proposed method estimates the sufficient statistics from the randomized data. The estimated sufficient statistics are then used to learn a BN. For structure learning, we face the familiar extralink problem since estimation errors tend to break the conditional independence among the variables. We propose modifications of score functions used for BN learning, to solve this problem. We show both theoretically and experimentally that post randomization is an efficient, flexible, and easytouse method to learn Bayesian network from privacy sensitive data.
Under consideration forpublication in Knowledge and Information Systems A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large PeertoPeer Networks
, 2008
"... Abstract. Inthispaperwe developalocal distributedprivacypreservingalgorithm for feature selection in a large peertopeer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist manysolutions ..."
Abstract
 Add to MetaCart
Abstract. Inthispaperwe developalocal distributedprivacypreservingalgorithm for feature selection in a large peertopeer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist manysolutions for feature selection whenthedatais located at a central location. However, it becomes extremely challenging to perform the same when the data is distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peertopeer networks, dynamic nature of the data/network and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on realworld datasets in order to performance of the proposed algorithm.
A Post Randomization Framework for PrivacyPreserving Bayesian Network Parameter Learning
"... Abstract: Post Randomization technique has been successfully used in statistical disclosure limitation. The application of Post Randomization technique to PrivacyPreserving data mining is explored in this paper. The problem of privacypreserving Bayesian network parameter learning is considered as ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: Post Randomization technique has been successfully used in statistical disclosure limitation. The application of Post Randomization technique to PrivacyPreserving data mining is explored in this paper. The problem of privacypreserving Bayesian network parameter learning is considered as a specific example. We propose to use post randomization technique to randomize the privacysensitive variables in learning Bayesian network parameters from distributed heterogeneous databases. The only required information from the data set is a set of sufficient statistics for learning Bayesian network parameters. The proposed method estimates the sufficient statistics from the randomized data. We show both theoretically and experimentally that this method learns a set of accurate parameters, even under large levels of randomization. We also illustrate the trade off between privacy and accuracy by simulations.