## Collective Mining of Bayesian Networks from Distributed Heterogeneous Data (2002)

### Cached

### Download Links

- [www.csee.umbc.edu]
- [www.csee.umbc.edu]
- [www.cs.umbc.edu]
- [www.csee.umbc.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 18 - 7 self |

### BibTeX

@MISC{Chen02collectivemining,

author = {R. Chen and K. Sivakumar and H. Kargupta},

title = {Collective Mining of Bayesian Networks from Distributed Heterogeneous Data},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and non-local variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.

### Citations

7440 |
Probabilistie Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...odel selection in DDM [43], are also reported. We now review important literature on learning using Bayesian networks (BN). A BN is a probabilistic graphical model that represents uncertain knowledge =-=[51, 31, 11]-=-. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has pro... |

1342 | Local computations with probabilities on graphical structures and their application to expert systems - Lauritzen, Spiegelhalter - 1988 |

1158 | Empirical analysis of predictive algorithms for collaborative filtering
- Breese, Heckerman, et al.
- 1998
(Show Context)
Citation Context ...lications of Bayesian network to clustering (AutoClass) and classification is discussed in [12, 19, 22, 57]. Zweig and Russel [66] use Bayesian networks for speech recognition, whereas Breese et. al. =-=[9]-=- discuss collaborative filtering methods that use Bayesian network learning algorithms. Applications to causal learning in social sciences is discussed in [59]. In [40] the authors report a technique ... |

1132 | A Bayesian method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...scribe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is proposed in [63, 25]. The Bayesian score to learn the structure of a Bayesian network is discussed in =-=[16, 10, 27]-=-. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is presented in [8, 39, 60]. Learning BN structure using greedy hill-climbing and other variants ... |

969 |
An introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...odel selection in DDM [43], are also reported. We now review important literature on learning using Bayesian networks (BN). A BN is a probabilistic graphical model that represents uncertain knowledge =-=[51, 31, 11]-=-. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has pro... |

948 | Learning bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...scribe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is proposed in [63, 25]. The Bayesian score to learn the structure of a Bayesian network is discussed in =-=[16, 10, 27]-=-. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is presented in [8, 39, 60]. Learning BN structure using greedy hill-climbing and other variants ... |

895 | A tutorial on learning with Bayesian Networks
- Heckerman
- 1995
(Show Context)
Citation Context ...tributed data scenario. 3.1. Bayesian Networks: A review A Bayesian network (BN) is a probabilistic graph model. It can be defined as a pair (G, p), where G = (V, E) is a directed acyclic graph (DAG) =-=[31, 26]-=-. Here, V is the vertex set which represents variables in the problem and E is the edge set which denotes probabilistic relationships among the variables. For a variable X ∈ V, a parent of X is a pape... |

635 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...3, 39] for discussion on how to sequentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is discussed in =-=[12, 19, 22, 57]-=-. Zweig and Russel [66] use Bayesian networks for speech recognition, whereas Breese et. al. [9] discuss collaborative filtering methods that use Bayesian network learning algorithms. Applications to ... |

626 |
Markov chain Monte Carlo in practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...roposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is proposed in =-=[63, 25]-=-. The Bayesian score to learn the structure of a Bayesian network is discussed in [16, 10, 27]. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is ... |

509 |
Bayesian classification (AutoClass): Theory and results
- Cheeseman, Stutz
(Show Context)
Citation Context ...sian model averaging are presented in [10, 28, 44]. paper.tex; 12/02/2001; 13:19; p.78 Chen, Sivakumar, and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in =-=[15, 12, 20, 21, 46, 56, 62]-=-. The relationship between causality and Bayesian networks is discussed in [28, 52, 59, 29]. See [10, 23, 39] for discussion on how to sequentially update the structure of a network as more data is av... |

320 | Syskill webert: Identifying interesting web sites
- Pazzani, Muramatsu, et al.
- 1996
(Show Context)
Citation Context ...urate personalization is very important. This is indeed quite well appreciated by the business community and use of Bayesian techniques for personalizing web sites has already been reported elsewhere =-=[49, 50, 5, 6]-=-. The scenario described here is however somewhat different from the traditional web personalization applications where web-log data are paper.tex; 12/02/2001; 13:19; p.56 Chen, Sivakumar, and Kargup... |

316 | Learning and Revising User Profiles: The Identification of Interesting Web Sites
- Pazzani, Billsus
- 1997
(Show Context)
Citation Context ...urate personalization is very important. This is indeed quite well appreciated by the business community and use of Bayesian techniques for personalizing web sites has already been reported elsewhere =-=[49, 50, 5, 6]-=-. The scenario described here is however somewhat different from the traditional web personalization applications where web-log data are paper.tex; 12/02/2001; 13:19; p.56 Chen, Sivakumar, and Kargup... |

292 | Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ... variants is introduced in [28], whereas Chickering [14] introduced a method based on search over equivalence network classes. Methods for approximating full Bayesian model averaging are presented in =-=[10, 28, 44]-=-. paper.tex; 12/02/2001; 13:19; p.78 Chen, Sivakumar, and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in [15, 12, 20, 21, 46, 56, 62]. The relationship bet... |

254 | Bayesians networks without tears
- Charniak
- 1991
(Show Context)
Citation Context ...odel selection in DDM [43], are also reported. We now review important literature on learning using Bayesian networks (BN). A BN is a probabilistic graphical model that represents uncertain knowledge =-=[51, 31, 11]-=-. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has pro... |

245 | The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks - Beinlich, Suermondt, et al. - 1989 |

227 |
The EM algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ...[51, 31, 11]. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen =-=[41]-=- has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is propos... |

224 | The Bayesian structural EM algorithm
- Friedman
- 1998
(Show Context)
Citation Context ...sian model averaging are presented in [10, 28, 44]. paper.tex; 12/02/2001; 13:19; p.78 Chen, Sivakumar, and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in =-=[15, 12, 20, 21, 46, 56, 62]-=-. The relationship between causality and Bayesian networks is discussed in [28, 52, 59, 29]. See [10, 23, 39] for discussion on how to sequentially update the structure of a network as more data is av... |

206 |
1990. Sequential updating of conditional probabilities on directed graphical structures
- Spiegelhalter, Lauritzen
(Show Context)
Citation Context ...ng using Bayesian networks (BN). A BN is a probabilistic graphical model that represents uncertain knowledge [51, 31, 11]. Learning parameters of a Bayesian network from complete data is discussed in =-=[58, 10]-=-. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] d... |

199 | Learning Bayesian belief networks. An approach based on the MDL principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...n score to learn the structure of a Bayesian network is discussed in [16, 10, 27]. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is presented in =-=[8, 39, 60]-=-. Learning BN structure using greedy hill-climbing and other variants is introduced in [28], whereas Chickering [14] introduced a method based on search over equivalence network classes. Methods for a... |

198 | Theory of Refinement on Bayesian Networks
- Buntine
- 1991
(Show Context)
Citation Context ...ng using Bayesian networks (BN). A BN is a probabilistic graphical model that represents uncertain knowledge [51, 31, 11]. Learning parameters of a Bayesian network from complete data is discussed in =-=[58, 10]-=-. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] d... |

183 | Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network
- Chickering, Heckerman
- 1997
(Show Context)
Citation Context ...sian model averaging are presented in [10, 28, 44]. paper.tex; 12/02/2001; 13:19; p.78 Chen, Sivakumar, and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in =-=[15, 12, 20, 21, 46, 56, 62]-=-. The relationship between causality and Bayesian networks is discussed in [28, 52, 59, 29]. See [10, 23, 39] for discussion on how to sequentially update the structure of a network as more data is av... |

162 | Adaptive Probabilistic Networks with Hidden Variables
- Binder, Koller, et al.
- 1997
(Show Context)
Citation Context ...ncertain knowledge [51, 31, 11]. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in =-=[7, 61]-=-. Lauritzen [41] has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sa... |

132 | Learning equivalence classes of Bayesian-network structures
- Chickering
- 2002
(Show Context)
Citation Context ...work based on the Minimal Description Length (MDL) principle is presented in [8, 39, 60]. Learning BN structure using greedy hill-climbing and other variants is introduced in [28], whereas Chickering =-=[14]-=- introduced a method based on search over equivalence network classes. Methods for approximating full Bayesian model averaging are presented in [10, 28, 44]. paper.tex; 12/02/2001; 13:19; p.78 Chen, ... |

112 | Speech Recognition with Dynamic Bayesian Networks
- Zweig
- 1998
(Show Context)
Citation Context ...uentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is discussed in [12, 19, 22, 57]. Zweig and Russel =-=[66]-=- use Bayesian networks for speech recognition, whereas Breese et. al. [9] discuss collaborative filtering methods that use Bayesian network learning algorithms. Applications to causal learning in soci... |

102 | Comment: Graphical models, causality and intervention
- Pearl
- 1993
(Show Context)
Citation Context ..., and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in [15, 12, 20, 21, 46, 56, 62]. The relationship between causality and Bayesian networks is discussed in =-=[28, 52, 59, 29]-=-. See [10, 23, 39] for discussion on how to sequentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is d... |

87 | Collective Data Mining: A New Perspective Towards Distributed Data Mining
- Kargupta, Park, et al.
- 2000
(Show Context)
Citation Context ...t uncertain knowledge. Specifically, we address the problem of learning a BN from heterogenous distributed data. It uses a collective data mining (CDM) approach introduced earlier by Kargupta et. al. =-=[30, 32, 33, 35]-=-. Section 2 provides some background and reviews existing literature in this area. Section 3 presents the collective Bayesian learning technique. Experimental results for two datasets — one simulated ... |

84 | A bayesian approach to causal discovery
- Heckerman, Meek, et al.
- 1997
(Show Context)
Citation Context ..., and Kargupta Learning the structure of Bayesian network from incomplete data, is considered in [15, 12, 20, 21, 46, 56, 62]. The relationship between causality and Bayesian networks is discussed in =-=[28, 52, 59, 29]-=-. See [10, 23, 39] for discussion on how to sequentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is d... |

72 |
BUGS: a program to perform Bayesian inference using Gibbs sampling
- Thomas, Spiegelhalter, et al.
- 1992
(Show Context)
Citation Context ...roposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is proposed in =-=[63, 25]-=-. The Bayesian score to learn the structure of a Bayesian network is discussed in [16, 10, 27]. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is ... |

67 | Learning Belief Networks from Data: An Information Theory Based Approach
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ....1 0.8 0.9 0.2 E 0.9 0.1 0.1 0.01 0.1 0.9 0.9 0.99 X 0.2 0.6 0.8 0.4 D 0.9 0.1 0.1 0.01 0.1 0.9 0.9 0.99 Local Bayesian networks were constructed using a conditional independence test based algorithm =-=[13]-=-, for learning the BN structure and a maximum likelihood based method for estimating the conditional probabilities. The local networks were exact as far as the edges involving only the local variables... |

56 | Y.: Update Rules for Parameter Estimation in Bayesian Networks
- Bauer, Koller, et al.
- 1997
(Show Context)
Citation Context ..., 10]. Learning parameters from incomplete data using gradient methods is discussed in [7, 61]. Lauritzen [41] has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. =-=[3]-=- describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sampling is proposed in [63, 25]. The Bayesian score to learn the structure of a Bayesian network is discussed ... |

54 | Distributed Clustering Using Collective Principal Component Analysis
- Kargupta, Huang, et al.
- 2001
(Show Context)
Citation Context ...que for distributed decision tree construction [35] and wavelet-based multi-variate regression [30]. Several distributed clustering techniques based on the Collective framework are proposed elsewhere =-=[32, 34]-=-. They also proposed the collective PCA technique [34, 33] and its extension to a distributed clustering application. Additional work on distributed decision tree learning [4], clustering [45, 48, 55]... |

53 | Distributed Data Mining: Algorithms, Systems, and Applications. In: Data Mining Handbook, Nong Ye (Ed.), IEA - Park, Kargupta - 2002 |

50 | On the sample complexity of learning Bayesian networks
- Friedman, Yakhini
- 1996
(Show Context)
Citation Context .... If smallest value of N(ɛ, δ) that satisfies this requirement is called the sample complexity. This is usually referred to as the probably approximately correct (PAC) framework. Friedman and Yakhini =-=[24]-=- have examined the sample complexity of the maximum description length principle (MDL) based learning procedure for BNs. Dasgupta [18] gave a thorough analysis for the multinomial model with Boolean v... |

46 | Sequential update of Bayesian network structure
- Friedman, Goldszmidt
- 1997
(Show Context)
Citation Context ...g the structure of Bayesian network from incomplete data, is considered in [15, 12, 20, 21, 46, 56, 62]. The relationship between causality and Bayesian networks is discussed in [28, 52, 59, 29]. See =-=[10, 23, 39]-=- for discussion on how to sequentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is discussed in [12, 1... |

46 | Collective, hierarchical clustering from distributed, heterogeneous data
- Johnson, Kargupta
- 1999
(Show Context)
Citation Context ...t uncertain knowledge. Specifically, we address the problem of learning a BN from heterogenous distributed data. It uses a collective data mining (CDM) approach introduced earlier by Kargupta et. al. =-=[30, 32, 33, 35]-=-. Section 2 provides some background and reviews existing literature in this area. Section 3 presents the collective Bayesian learning technique. Experimental results for two datasets — one simulated ... |

44 | Learning Bayesian networks: A unification for discrete and Gaussian domains
- Heckerman, Geiger
- 1995
(Show Context)
Citation Context ...ructure of a Bayesian network based on the Minimal Description Length (MDL) principle is presented in [8, 39, 60]. Learning BN structure using greedy hill-climbing and other variants is introduced in =-=[28]-=-, whereas Chickering [14] introduced a method based on search over equivalence network classes. Methods for approximating full Bayesian model averaging are presented in [10, 28, 44]. paper.tex; 12/02/... |

33 |
A construction of Bayesian networks from databases based on an MDL scheme
- Suzuki
- 1993
(Show Context)
Citation Context ...n score to learn the structure of a Bayesian network is discussed in [16, 10, 27]. Learning the structure of a Bayesian network based on the Minimal Description Length (MDL) principle is presented in =-=[8, 39, 60]-=-. Learning BN structure using greedy hill-climbing and other variants is introduced in [28], whereas Chickering [14] introduced a method based on search over equivalence network classes. Methods for a... |

29 | Accelerated quantification of bayesian networks with incomplete data
- Thiesson
- 1995
(Show Context)
Citation Context ...ncertain knowledge [51, 31, 11]. Learning parameters of a Bayesian network from complete data is discussed in [58, 10]. Learning parameters from incomplete data using gradient methods is discussed in =-=[7, 61]-=-. Lauritzen [41] has proposed an EM algorithm to learn Bayesian network parameters, whereas Bauer et. al. [3] describe methods for accelerating convergence of the EM algorithm. Learning using Gibbs sa... |

26 | Estimating dependency structure as a hidden variable
- Meila, Jordan, et al.
- 1997
(Show Context)
Citation Context |

26 | Distributed multivariate regression using wavelet-based collective data mining
- Hershberger, Kargupta
(Show Context)
Citation Context ...t uncertain knowledge. Specifically, we address the problem of learning a BN from heterogenous distributed data. It uses a collective data mining (CDM) approach introduced earlier by Kargupta et. al. =-=[30, 32, 33, 35]-=-. Section 2 provides some background and reviews existing literature in this area. Section 3 presents the collective Bayesian learning technique. Experimental results for two datasets — one simulated ... |

25 | A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classi - Singh, Provan - 1995 |

25 | Learning probabilistic user models
- BILLSUS, PAZZANI
- 1997
(Show Context)
Citation Context ...urate personalization is very important. This is indeed quite well appreciated by the business community and use of Bayesian techniques for personalizing web sites has already been reported elsewhere =-=[49, 50, 5, 6]-=-. The scenario described here is however somewhat different from the traditional web personalization applications where web-log data are paper.tex; 12/02/2001; 13:19; p.56 Chen, Sivakumar, and Kargup... |

25 |
Learning Bayesian networks in the presence of missing values and hidden variables
- Friedman
- 1997
(Show Context)
Citation Context |

23 | The sample complexity of learning fixed-structure bayesian networks
- Dasgupta
- 1997
(Show Context)
Citation Context ...BN. For i = 1, 2, . . . , n, let hi(xi | πi), ci(xi | πi) be the corresponding conditional distribution at node i, where xi is the variable at node i and πi is the set of parents of node i. Following =-=[18]-=-, define a distance dCP (P, ci, hi) between hi and ci with respect to the true distribution P : dCP (P, ci, hi) = ∑ P (πi) ∑ P (xi | πi) ln( ci(xi | πi) ). (10) hi(xi | πi) πi It is then easy to show ... |

23 |
Fraud/uncollectible debt detection using a bayesian network based learning system: A rare binary outcome with mixed data structures
- Ezawa, Schuermann
- 1995
(Show Context)
Citation Context ...3, 39] for discussion on how to sequentially update the structure of a network as more data is available. Applications of Bayesian network to clustering (AutoClass) and classification is discussed in =-=[12, 19, 22, 57]-=-. Zweig and Russel [66] use Bayesian networks for speech recognition, whereas Breese et. al. [9] discuss collaborative filtering methods that use Bayesian network learning algorithms. Applications to ... |

22 |
Inductive Policy: The
- Provost, Buchanan
- 1995
(Show Context)
Citation Context ...eneous cases. In the following, we review only the existing literature for heterogeneous DDM. Mining from heterogeneous data constitutes an important class of DDM problems. This issue is discussed in =-=[54]-=- from the perspective of inductive bias. The WoRLD system [2] addressed the problem of concept learning from heterogeneous sites by developing an “activation spreading” approach that is based on first... |

22 | Learning mixtures of bayesian networks
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context |

16 |
Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence
- Abe, Takeuchi, et al.
- 1991
(Show Context)
Citation Context ...thesis (candidate probability paper.tex; 12/02/2001; 13:19; p.14Collective Mining of Bayesian Networks from Distributed Heterogeneous Data 15 distribution for the underlying true distribution), then =-=[1]-=- dKL(p ∗ , h) = M∑ i=1 = ln 1 M p ∗ (si) ln( p∗ (si) ) = h(si) − 1 M M∑ i=1 M∑ ln(h(si)). i=1 1 1 ln M M − M∑ i=1 1 M ln(h(si)) Therefore, minimizing the KL distance with respect to the empirically ob... |

16 |
Learning Bayesian networks from incomplete data
- Singh
- 1997
(Show Context)
Citation Context |

16 | 2000, ‘A Framework for Finding Distributed Data Mining Strategies That are Intermediate Between Centeralized
- Turinsky, Grossman
(Show Context)
Citation Context ...ue [34, 33] and its extension to a distributed clustering application. Additional work on distributed decision tree learning [4], clustering [45, 48, 55], genetic learning[47] DDM design optimization =-=[65]-=-, classifier pruning [53], DDM architecture [38], and problem decomposition and local model selection in DDM [43], are also reported. We now review important literature on learning using Bayesian netw... |