## Tractable learning of large bayes net structures from sparse data (2004)

### Cached

### Download Links

- [www.cs.iastate.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.aicml.cs.ualberta.ca]
- [www.autonlab.org]
- [kingman.cs.ualberta.ca]
- DBLP

### Other Repositories/Bibliography

Citations: | 31 - 4 self |

### BibTeX

@INPROCEEDINGS{Goldenberg04tractablelearning,

author = {Anna Goldenberg and Andrew Moore},

title = {Tractable learning of large bayes net structures from sparse data},

booktitle = {},

year = {2004},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

statistics for creating the global Bayes Net. This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model? Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache n-way sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets.

### Citations

2920 | Fast Algorithms for Mining Association Rules
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...Ñ attributes and �Ö�Õ Ë � ×. Threshold × is called support in the data mining literature. Given sparse data and a support × greater than about 3, it is surprisingly easy to compute all Frequent Sets (=-=Agrawal & Srikant, 1994-=-). There is an abundance of literature on Frequent Sets as their collection is an essential part of the association rules algorithms (Agrawal et al., 1993; Agrawal & Srikant, 1994; Han & Kamber, 2000)... |

2661 | Mining association rules between sets of items in large databases
- Agrawal, Imieliski, et al.
- 1993
(Show Context)
Citation Context ...yesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (=-=Agrawal et al., 1993-=-), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model? Large sparse datasets with hundreds of thousands of records and attributes appear in social... |

2158 | Data Mining: Concepts and Techniques
- Han, Kamber
- 2001
(Show Context)
Citation Context ...wal & Srikant, 1994). There is an abundance of literature on Frequent Sets as their collection is an essential part of the association rules algorithms (Agrawal et al., 1993; Agrawal & Srikant, 1994; =-=Han & Kamber, 2000-=-) widely used in commercial data mining. There are multiple references to Frequent Sets in the area of modelling sparse datasets as well (Mannila & Toivonen, 1996; Chickering & Heckerman, 1999; Pavlov... |

1186 | Kadie: Empirical Analysis of Predictive Algorithms for Collaborative Filtering
- Breese, Heckerman, et al.
- 1998
(Show Context)
Citation Context ...ystems. Online systems such as Amazon provide suggestions of what might appeal to the user based on user’s other preferences. The use of Bayesian networks in this domain has become established, e.g. (=-=Breese et al., 1998-=-). Often the goal of recommender systems is to predict which are the most likely items that the user would buy next. An example of answering analogous query using Bayes Nets built by our algorithm is ... |

533 | Discrete Multivariate Analysis: Theory and Practice
- Bishop, Fienberg, et al.
- 1975
(Show Context)
Citation Context ...introduces quite a few interactions between variables that have low marginal counts. Model fitting in contingency tables in general is sensitive to very low marginal counts even if they are not zero (=-=Bishop et al., 1977-=-). Here we use BDeu, which is less sensitive to low counts. Despite this, it seems to be a good idea to keep support relatively large. In our case, we have tested a few support sizes on smaller datase... |

245 |
Inferring cellular networks using probabilistic graphical models. Science 2004;303:799–805
- Friedman
(Show Context)
Citation Context ...tudies in the gene expression data and social networks in particular suggest that correlations of entities on the local level are very important and in fact they are what makes up the global network (=-=Friedman, 2004-=-; Breiger, 2003). So, along with being computationally practical, Bayesian Networks created by our algorithm have a very natural motivation stemming from those important domains. We provide results on... |

128 | Cached sufficient statistics for efficient machine learning with large datasets
- Moore, Lee
- 1998
(Show Context)
Citation Context ...atistics. The inference took less than a second. 7. Related Work Some of the earlier work in this area has concentrated on efficient representation of sparse data and caching of n-way counts (Moore & =-=Lee, 1998-=-). Chickering and Heckerman (1999) and Meila (1999) have noted that computations requiring one-way and pairwise counts can be sped up significantly when dealing with sparse data using caching and such... |

92 | Multiple Uses of Frequent Sets and Condensed Representations
- Mannila, Toivonen
- 1996
(Show Context)
Citation Context ...awal et al., 1993; Agrawal & Srikant, 1994; Han & Kamber, 2000) widely used in commercial data mining. There are multiple references to Frequent Sets in the area of modelling sparse datasets as well (=-=Mannila & Toivonen, 1996-=-; Chickering & Heckerman, 1999; Pavlov et al., 2003; Hollmen et al., 2003). This is not surprising, since sparseness implies very few co-occurrences between items. In fact, most items do not co-occur ... |

71 |
A Bayesian method for constructing Bayesian belief networks from databases
- Cooper, Herskovits
- 1990
(Show Context)
Citation Context ...the previous steps and attempt adding edges directly from the current node to its grandchildren. 5.2. Hillclimbing One of the standard techniques to improve the score is hillclimbing as described in (=-=Cooper & Herskovits, 1991-=-). This technique improves the score by adding/removing/reversing arcs in a Bayes Net. The set of operations and edge selection procedure may differ between algorithms. Usually hillclimbing is perform... |

54 |
Are randomly grown graphs really random? http://arxiv.org/abs/cond-mat/0104546 Vol 2
- Callaway, Hopcroft, et al.
- 2001
(Show Context)
Citation Context ...o perform those operations given the same number of nodes for 300; 000 edges with relatively small increase in the score. In that sense, the random graphs might not be exactly random as discussed in (=-=Callaway et al., 2001-=-). 6.3. Example application One of the important and growing application fields of large Bayes Nets is recommender systems. A related application is intelligence: having detected a subset of participa... |

48 | Beyond independence: probabilistic models for query approximation on binary transaction data
- Pavlov, Mannila, et al.
- 2003
(Show Context)
Citation Context ..., 2000) widely used in commercial data mining. There are multiple references to Frequent Sets in the area of modelling sparse datasets as well (Mannila & Toivonen, 1996; Chickering & Heckerman, 1999; =-=Pavlov et al., 2003-=-; Hollmen et al., 2003). This is not surprising, since sparseness implies very few co-occurrences between items. In fact, most items do not co-occur with each other, hence we expect the majority of th... |

46 | Large data sets lead to overly complex models: an explanation and a solution
- Oates, Jensen
- 1998
(Show Context)
Citation Context ... to the multiple hypothesis testing of hundreds of thousands of possible parents. Correction for multiple hypothesis testing problem (similar to corrections used in other learning algorithms such as (=-=Oates & Jensen, 1998-=-)) will be incorporated into Ë�ÆË in the future. Table 6. Overfitting testing dataset train test citeseer hillclimb -30.6738 -31.0127 citeseer × � -23.9227 -26.3253 citeseer × �� -24.1959 -25.0119 imd... |

30 | Mining Complex Models from Arbitrarily Large Databases in Constant Time, in
- Hulten, Domingos
- 2002
(Show Context)
Citation Context ...yesian Network is not limited we perceive it as an improvement on the original Sparse Candidate algorithm. Sampling was proposed as one of the techniques to speed up modelling in massive datasets in (=-=Hulten & Domingos, 2002-=-; Pelleg & Moore, 2002). Though an interesting direction it seems to be orthogonal to our approach. The idea of augmenting Bayes Nets with high mutual information edges is based on the fact that such ... |

17 | An accelerated Chow and Liu algorithm: fitting tree distributions to high-dimensional sparse data
- Meilă
- 1999
(Show Context)
Citation Context ...ity, though still very sparse), they could cause significant negative correlations that we could miss. Fortunately, such negative pairwise correlations can be detected cheaply using a technique from (=-=Meila, 1999-=-). Let I AB be the mutual information between two attributes. Meila showed that the mutual information can be calculated in a very efficient manner, particularly when dealing with discrete binary data... |

12 |
Statistics of social configurations
- Moreno, Jennings
- 1938
(Show Context)
Citation Context ...ets built by our algorithm is presented in Section 6.2. The idea of representing social networks as people connected by directed arrows has been explored in social science domain for almost 70 years (=-=Moreno & Jennings, 1938-=-). Initially analyzed networks were on the order of 10s of nodes. However, improvements in data collection and especially the birth of online communities made it necessary to look at much larger netwo... |

11 | Mixture models and frequent sets: combining global and local methods for 0-1 data
- Hollmen, Seppanen, et al.
- 2003
(Show Context)
Citation Context ...n commercial data mining. There are multiple references to Frequent Sets in the area of modelling sparse datasets as well (Mannila & Toivonen, 1996; Chickering & Heckerman, 1999; Pavlov et al., 2003; =-=Hollmen et al., 2003-=-). This is not surprising, since sparseness implies very few co-occurrences between items. In fact, most items do not co-occur with each other, hence we expect the majority of the counts in the pairwi... |

3 |
Emergent themes in social network analysis: Results, challenges, opportunities
- Breiger
- 2003
(Show Context)
Citation Context ...ne expression data and social networks in particular suggest that correlations of entities on the local level are very important and in fact they are what makes up the global network (Friedman, 2004; =-=Breiger, 2003-=-). So, along with being computationally practical, Bayesian Networks created by our algorithm have a very natural motivation stemming from those important domains. We provide results on sparse massive... |

3 | Learning bayes network structure from massive datasets: The ”sparse candidate” algorithm - Friedman, Nachman, et al. - 1999 |

2 |
Learning Bayesian Netowrks: The combination of konwledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...d � for deletion and arc-reversal, and then pick an edge at random to see whether performing the chosen operation improves the global score. 6. Evaluation The evaluation uses BDeu score described in (=-=Heckerman et al., 1995-=-) and also presented here in equation 3 to compare results between different configurations of our algorithm and to the randomized hillclimbing as described in Section 5.2. Ë �� � �ÐÓ� Ò� Õ� � �� �� Õ... |

1 |
Fast learning from sparse data. UAI 15
- Chickering, Heckerman
- 1999
(Show Context)
Citation Context ... & Srikant, 1994; Han & Kamber, 2000) widely used in commercial data mining. There are multiple references to Frequent Sets in the area of modelling sparse datasets as well (Mannila & Toivonen, 1996; =-=Chickering & Heckerman, 1999-=-; Pavlov et al., 2003; Hollmen et al., 2003). This is not surprising, since sparseness implies very few co-occurrences between items. In fact, most items do not co-occur with each other, hence we expe... |

1 |
Tractable structural learning of large bayesian networks from sparse data
- Goldenberg, Moore
- 2004
(Show Context)
Citation Context ...bles 9 and 10. The suggested completions are in fact people that are either part of or collaborate closely with Daphne Koller’s group, 2 More details, omitted here for space reasons, can be found in (=-=Goldenberg & Moore, 2004-=-)sTable 8. Time (min) per task for Ë�ÆË. (× ��, mfss ��) dataset/task freq sets local struct search make �dump & DAG MI augment Ò� degree augment citeseer 65.49 4.11 .2 17.2 97.5 imdb 196.22 15.43 13.... |