## Learning Arithmetic Circuits

### Cached

### Download Links

Citations: | 16 - 7 self |

### BibTeX

@MISC{Lowd_learningarithmetic,

author = {Daniel Lowd and Pedro Domingos},

title = { Learning Arithmetic Circuits },

year = {}

}

### OpenURL

### Abstract

Graphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly penalizes the cost of inference. Specifically, we learn arithmetic circuits with a penalty on the number of edges in the circuit (in which the cost of inference is linear). Our algorithm is equivalent to learning a Bayesian network with context-specific independence by greedily splitting conditional distributions, at each step scoring the candidates by compiling the resulting network into an arithmetic circuit, and using its size as the penalty. We show how this can be done efficiently, without compiling a circuit from scratch for each candidate. Experiments on several real-world domains show that our algorithm is able to learn tractable models with very large treewidth, and yields more accurate predictions than a standard context-specific Bayesian network learner, in far less time.

### Citations

7316 |
Probabilistic reasoning in intelligent systems
- Pearl
- 1988
(Show Context)
Citation Context ...ETWORKS A Bayesian network encodes the joint probability distribution of a set of n variables, {X1, . . . , Xn}, as a directed acyclic graph and a set of conditional probability distributions (CPDs) (=-=Pearl, 1988-=-). Each node corresponds to a variable, and the CPD associated with it gives the probability of each state of the variable given every possible combination of states of its parents. The set of parents... |

2963 |
Uci repository of machine learning databases, university of california, irvine, ca . www.ics.uci.edu/mlearn/MLRepository.html
- Blake, Merz
- 1998
(Show Context)
Citation Context ...a certain category. Anonymous MSWeb is visit data for 294 areas (Vroots) of the Microsoft web site, collected during one week in February 1998. It can be found in the UCI machine learning repository (=-=Blake & Merz, 2000-=-). EachMovie3 is a collaborative filtering dataset in which users rate movies they have seen. We took a 10% sample of the original dataset, focused on the 500 most-rated movies, and reduced each varia... |

591 |
Markov chain Monte Carlo in practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...esian networks is #Pcomplete (Roth, 1996). Because exact inference is intractable, approximate methods are often used, of which the most popular is Gibbs sampling, a form of Markov chain Monte Carlo (=-=Gilks et al., 1996-=-). A Gibbs sampler proceeds by sampling each nonevidence variable in turn conditioned on its Markov blanket (parents, children and parents of children). The distribution of the query variables is then... |

331 |
Complexity of finding embedings in a k-tree
- Arnborg, Corneil, et al.
- 1987
(Show Context)
Citation Context ...perty). The treewidth of a junction tree is one less than the maximum clique size. The complexity of inference is exponential in the treewidth. Finding the minimum-treewidth junction tree is NP-hard (=-=Arnborg et al., 1987-=-). Inference in Bayesian networks is #Pcomplete (Roth, 1996). Because exact inference is intractable, approximate methods are often used, of which the most popular is Gibbs sampling, a form of Markov ... |

314 | Computer Networks as
- Wellman
- 2001
(Show Context)
Citation Context ..., by suitably directing the learning process. Bayesian networks can be learned using local search to maximize a likelihood or Bayesian score, with operators like edge addition, deletion and reversal (=-=Heckerman et al., 1995-=-). Typically, the number of parameters or edges in the network is penalized to avoid overfitting, but this is only very indirectly related to the cost of inference. Two edge additions that produce the... |

240 | Learning Bayesian networks with local structure
- Friedman, Goldszmidt
- 1999
(Show Context)
Citation Context ... use decision trees as CPDs, taking advantage of contextspecific independencies (i.e., a child variable is independent of some of its parents given some values of the others) (Boutilier et al., 1996; =-=Friedman & Goldszmidt, 1996-=-; Chickering et al., 1997). The algorithm we present in this paper learns arithmetic circuits that are equivalent to this type of Bayesian network. In a decision tree CPD for variable Xi, each interio... |

230 | On the hardness of approximate reasoning
- Roth
- 1996
(Show Context)
Citation Context ... clique size. The complexity of inference is exponential in the treewidth. Finding the minimum-treewidth junction tree is NP-hard (Arnborg et al., 1987). Inference in Bayesian networks is #Pcomplete (=-=Roth, 1996-=-). Because exact inference is intractable, approximate methods are often used, of which the most popular is Gibbs sampling, a form of Markov chain Monte Carlo (Gilks et al., 1996). A Gibbs sampler pro... |

175 | A Bayesian approach to learning Bayesian networks with local structure
- Chickering, Heckerman, et al.
- 1997
(Show Context)
Citation Context ...taking advantage of contextspecific independencies (i.e., a child variable is independent of some of its parents given some values of the others) (Boutilier et al., 1996; Friedman & Goldszmidt, 1996; =-=Chickering et al., 1997-=-). The algorithm we present in this paper learns arithmetic circuits that are equivalent to this type of Bayesian network. In a decision tree CPD for variable Xi, each interior node is labeled with on... |

117 | Learning with mixtures of trees - Meila, Jordan |

115 | A differential approach to inference in Bayesian networks
- Darwiche
(Show Context)
Citation Context ...hod that accomplishes this, by directly penalizing the cost of inference in the score function. Our method takes advantage of recent advances in exact inference by compilation to arithmetic circuits (=-=Darwiche, 2003-=-). An arithmetic circuit is a representation of a Bayesian network capable of answering arbitrary marginal and conditional queries, with the property that the cost of inference is linear in the size o... |

93 | KDD-Cup 2000 organizersâ€™ report: peeling the onion
- Kohavi, Brodley, et al.
(Show Context)
Citation Context ...the next iteration requires a slightly lower upper bound. 5 EXPERIMENTS 5.1 DATASETS We evaluated our methods on three widely used realworld datasets. The KDD Cup 2000 clickstream prediction dataset (=-=Kohavi et al., 2000-=-) consists of web session data taken from an online retailer. Using the subset of Hulten and Domingos (2002), each example consists of 65 Boolean variables, corresponding to whether or not a particula... |

52 | A logical approach to factoring belief networks - Darwiche - 2002 |

30 | Mining Complex Models from Arbitrarily Large Databases in Constant Time, in - Hulten, Domingos - 2002 |

30 | Naive Bayes models for probability estimation - Lowd, Domingos - 2005 |

28 | Efficient principled learning of thin junction trees - Chechetka, Guestrin |

11 | Maximum likelihood Markov networks: an algorithmic approach - Srebro - 2000 |

7 | Learning probabilistic decision graphs - Jaeger, Nielsen, et al. - 2006 |

2 |
The WinMine toolkit (Tech
- Chickering
- 2002
(Show Context)
Citation Context ... 0.02). We also tuned the perparameter cost kp. For KDD Cup, the best cost was 0.0; for MSWeb and EachMovie, the best costs were 1.0 for greedy ACs and 0.5 for quick ACs. We used the WinMine Toolkit (=-=Chickering, 2002-=-) as a baseline. WinMine implements the algorithm for learning Bayesian networks with local structure described in Section 2 (Chickering et al., 1997), and has a number of other state-of-the-art featu... |