## Interpolating conditional density trees (2002)

Venue: | A. Darwiche, N. Friedman (Eds.), Uncertainty in Artificial Intelligence |

Citations: | 5 - 0 self |

### BibTeX

@ARTICLE{Davies02interpolatingconditional,

author = {Scott Davies and Andrew Moore},

title = {Interpolating conditional density trees},

journal = {A. Darwiche, N. Friedman (Eds.), Uncertainty in Artificial Intelligence},

year = {2002},

volume = {18},

pages = {119--127}

}

### OpenURL

### Abstract

Joint distributions over many variables are frequently modeled by decomposing them into products of simpler, lower-dimensional conditional distributions, such as in sparsely connected Bayesian networks. However, automatically learning such models can be very computationally expensive when there are many datapoints and many continuous variables with complex nonlinear relationships, particularly when no good ways of decomposing the joint distribution are known a priori. In such situations, previous research has generally focused on the use of discretization techniques in which each continuous variable has a single discretization that is used throughout the entire network. In this paper, we present and compare a wide variety of tree-based algorithms for learning and evaluating conditional density estimates over continuous variables. These trees can be thought of as discretizations that vary according to the particular interactions being modeled; however, the density within a given leaf of the tree need not be assumed constant, and we show that such nonuniform leaf densities lead to more accurate density estimation. We have developed Bayesian network structure-learning algorithms that employ these tree-based conditional density representations, and we show that they can be used to practically learn complex joint probability models over dozens of continuous variables from thousands of datapoints. We focus on nding models that are simultaneously accurate, fast to learn, and fast to evaluate once they are learned.

### Citations

8198 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...nterpolating between 2 d densities associated with the corners of the leaf's bounding hyperbox. In both cases, each distribution tost is expressed as a mixture model and thenst with the EM algorithm (=-=Dempster et al., 1977-=-) to maximize the log-likelihood of the training data. Because the distribution of each mixture component issxed and only the prior probabilities of the mixture components are adjusted, EM can be perf... |

234 | Learning Bayesian Networks with Local Structure
- Friedman, Goldszmidt
- 1996
(Show Context)
Citation Context ... with continuous distributions in asnal step ((Monti and Cooper, 1998b), (Monti and Cooper, 1999)); or, a simultaneous search of both network structures and discretization policies can be performed ((=-=Friedman and Goldszmidt, 1996-=-a), (Monti and Cooper, 1998a)). In this previous research, however, the discretization of each variable has been global { that is, the same discretization for any particular variable is employed for a... |

182 | Learning Bayesian network structure from massive datasets: the “Sparse Candidate” algorithm - Friedman, Nachman, et al. - 1999 |

65 | Nonuniform dynamic discretization in hybrid networks
- Kozlov, Koller
- 1997
(Show Context)
Citation Context ...ation algorithm for density trees with constant-density leaves has been used in previous work by Kozlov and Koller on message-passing algorithms for inference in continuous-variable graphical models (=-=Kozlov and Koller, 1-=-997). Once this tree is computed, we can compute the conditional distribution simply as P (X i j ~ i ) = P (X i ; ~ i ) P ( ~ i ) , where computing the numerator and computing the denominator each ... |

25 | Bayesian classi (AutoClass): Theory and results - Cheeseman, Stutz - 1996 |

21 | Discovering Structure in Continuous Variables Using Bayesian Networks
- Hofmann, Tresp
- 1995
(Show Context)
Citation Context ...cently investigated the use of complex continuous distributions within Bayesian networks; for example, weighted sums of Gaussians (Driver and Morrell, 1995), Gaussian kernel-based density estimators (=-=Hofmann and Tresp, 1995-=-), and Gaussian processes (Friedman and Nachman, 2000) have been used to approximate conditional probability density functions. Such complex distributions over continuous variables are usually quite c... |

18 |
A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data
- Monti, Cooper
- 1998
(Show Context)
Citation Context ...red equivalent. This discretization can performed once before network structure-learning, and the resulting network structure can then be reparameterized with continuous distributions in asnal step ((=-=Monti and Cooper, 1998-=-b), (Monti and Cooper, 1999)); or, a simultaneous search of both network structures and discretization policies can be performed ((Friedman and Goldszmidt, 1996a), (Monti and Cooper, 1998a)). In this ... |

11 | Learning Hybrid Bayesian Networks from Data
- Monti, Cooper
- 1998
(Show Context)
Citation Context ...red equivalent. This discretization can performed once before network structure-learning, and the resulting network structure can then be reparameterized with continuous distributions in asnal step ((=-=Monti and Cooper, 1998-=-b), (Monti and Cooper, 1999)); or, a simultaneous search of both network structures and discretization policies can be performed ((Friedman and Goldszmidt, 1996a), (Monti and Cooper, 1998a)). In this ... |

11 | A Latent Variable Model for Multivariate Discretization
- Monti, Cooper
- 1999
(Show Context)
Citation Context ...tization can performed once before network structure-learning, and the resulting network structure can then be reparameterized with continuous distributions in asnal step ((Monti and Cooper, 1998b), (=-=Monti and Cooper, 1999-=-)); or, a simultaneous search of both network structures and discretization policies can be performed ((Friedman and Goldszmidt, 1996a), (Monti and Cooper, 1998a)). In this previous research, however,... |

8 |
Scaling Up the Accuracy of Naive-Bayes Classi a Decision-Tree Hybrid
- Kohavi
- 1996
(Show Context)
Citation Context ...ree is used to estimate conditional distributions for the output variable, it is similar in form and function to a hybrid decision tree / Naive Bayesian classier also developed in previous research (K=-=ohavi, 1996-=-). In the most general case when the tree has an arbitrary branch structure (and the variables are not necessarily discrete), the algorithm for computing conditional distributions essentially creates ... |

7 | Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous and Discrete Variables
- Davies, Moore
- 2000
(Show Context)
Citation Context ...cientic datasets, the task is to model the joint distribution over all the variables using a Bayesian network with asxed structure. (These structures had been learned automatically in previous work (D=-=avies and Moore, 200-=-0)). The results show that stratied conditional density trees model the distributions much more accuAlgorithm CART-like, Indep. Gauss. CART-like, Lin. Reg. Stratified, Indep. Gauss. Stratified, Lin. R... |

4 |
Implementation of Continous Bayesian Networks Using Sums of Weighted Gaussians
- Driver, Morrell
- 1995
(Show Context)
Citation Context ...ussians (e.g. (Heckerman and Geiger, 1995)). Some researchers have recently investigated the use of complex continuous distributions within Bayesian networks; for example, weighted sums of Gaussians (=-=Driver and Morrell, 1995-=-), Gaussian kernel-based density estimators (Hofmann and Tresp, 1995), and Gaussian processes (Friedman and Nachman, 2000) have been used to approximate conditional probability density functions. Such... |

4 |
Embedded Bayesian network classi
- Heckerman, Meek
- 1997
(Show Context)
Citation Context ...-level density stump with a root node branching on the variable to be predicted. Such Naive Bayes classiers have previously been used to model the conditional distributions within Bayesian networks (H=-=eckerman and Meek, 19-=-97). A commonly used Bayesian classier for continuous variables is to model each class distribution with a Gaussian; this classier is obtained simply with a density stump branching on the class variab... |

3 | Fast Factored Density Estimation and Compression with Bayesian Networks
- Davies
(Show Context)
Citation Context ...btree. A separate random holdout set of the training data is then used to prune the learned decision tree. Many variations of this learning algorithm are considered in the full version of this paper (=-=Davies, 2002-=-). Regression trees may be adequate for representing continuous conditional distributions in situations where they are in fact near-Gaussian, or when the problem involves guessing a point estimate and... |