## Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables (2000)

### Cached

### Download Links

Citations: | 7 - 2 self |

### BibTeX

@MISC{Davies00mix-nets:factored,

author = {Scott Davies and Andrew Moore},

title = {Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables},

year = {2000}

}

### OpenURL

### Abstract

Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in low-dimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kd-trees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which low-dimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.

### Citations

8090 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...s in many situations, but we have yet to verify this experimentally. 2.1.1 Learning Gaussian Mixtures From Data The EM algorithm is a popular method for learning mixture models from data (see, e.g., (=-=Dempster et al., 1977-=-)). The algorithm is an iterative algorithm with two steps per iteration. The Expectation or “E” step calculates the distribution over the unobserved mixture component variables, using the current est... |

7052 | Probabilistic reasoning in intelligent systems: networks of plausible inference - Pearl - 1988 |

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ous values. What sorts of models might we want to return? One powerful type of model for representing probability density functions over small sets of variables is a Gaussian mixture model (see e.g. (=-=Duda and Hart, 1973-=-)). Let represent the values that the datapoint in the dataset assigns to a variable set of interest . In a Gaussian mixture model over , we assume that the data are generated independently through th... |

2307 | Estimating the dimension of a model - SCHWARZ - 1978 |

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...developed and experimented with variations of one particular network structure-learning algorithm. There is a wide variety of structure-learning algorithms for discrete Bayesian networks (see, e.g., (=-=Cooper and Herskovits, 1992-=-; Lam and Bacchus, 1994; Heckerman et al., 1995; Friedman et al., 1999)), many of which could be employed when learning mix-nets. The quicker and dirtier of these algorithms might be applicable direct... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...pular heuristic approach to finding networks that model discrete data well is to hillclimb over network structures, using a scoring function such as the BIC as the criterion to maximize. (See, e.g., (=-=Heckerman et al., 1995-=-).) Unfortunately, hillclimbing usually requires scoring a very large number of networks. While our algorithm for learning Gaussian mixtures from data is comparatively fast for the complex task it per... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...frequently used class of algorithms involves measuring all pairwise interactions between the variables, and then constructing a network that models the strongest of these pairwise interactions (e.g. (=-=Chow and Liu, 1968-=-; Sahami, 1996; Friedman et al., 1999)). We use such an algorithm in this paper to automatically learn the structures of our Bayesian networks. In order to measure the pairwise interactions between th... |

589 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...s also easy to modify our greedy network-learning algorithm to learn networks for classification tasks; the resulting networks would be similar in structure to those generated by previous algorithms (=-=Friedman et al., 1997-=-, 1998; Sahami, 1996), but with more flexible parameterizations. While it is possible to perform exact inference in some kinds of networks modeling continuous values (e.g. (Driver and Morrell, 1995; A... |

312 | Estimating continuous distributions in Bayesian classifiers - John, Langley - 1995 |

272 | Introduction to Data Compression - Sayood - 2000 |

256 | Graphical Models for Machine Learning and Digital Communication - Frey - 1998 |

234 | Learning bayesian networks with local structure - Friedman, Goldszmidt - 1998 |

188 | Learning Bayesian Belief Networks : An Approach Based on
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...th variations of one particular network structure-learning algorithm. There is a wide variety of structure-learning algorithms for discrete Bayesian networks (see, e.g., (Cooper and Herskovits, 1992; =-=Lam and Bacchus, 1994-=-; Heckerman et al., 1995; Friedman et al., 1999)), many of which could be employed when learning mix-nets. The quicker and dirtier of these algorithms might be applicable directly to learning mix-net ... |

180 | Learning of Bayesian network structure from massive datasets: The “sparse candidate” algorithm
- Friedman, Nachman, et al.
- 1999
(Show Context)
Citation Context ...s involves measuring all pairwise interactions between the variables, and then constructing a network that models the strongest of these pairwise interactions (e.g. (Chow and Liu, 1968; Sahami, 1996; =-=Friedman et al., 1999-=-)). We use such an algorithm in this paper to automatically learn the structures of our Bayesian networks. In order to measure the pairwise interactions between the variables, we start with an empty B... |

155 | Learning Bayesian networks is np-complete
- Chickering
- 1996
(Show Context)
Citation Context ... best Bayesian network structure on our own, or at least find a “good” network structure?In general, finding the optimal Bayesian network structure with which to model a given dataset is NPcomplete (=-=Chickering, 1996-=-), even when all the data is discrete and there are no missing values or hidden variables. A popular heuristic approach to finding networks that model discrete data well is to hillclimb over network s... |

140 | Pattern Classi and Scene Analysis - Duda, Hart - 1973 |

109 | Learning limited dependence Bayesian classifiers
- Sahami
- 1996
(Show Context)
Citation Context ...s of algorithms involves measuring all pairwise interactions between the variables, and then constructing a network that models the strongest of these pairwise interactions (e.g. (Chow and Liu, 1968; =-=Sahami, 1996-=-; Friedman et al., 1999)). We use such an algorithm in this paper to automatically learn the structures of our Bayesian networks. In order to measure the pairwise interactions between the variables, w... |

89 | Very fast EM-based mixture model clustering using multiresolution kd-trees
- Moore
- 1999
(Show Context)
Citation Context ...nctions from data in low-dimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution d-trees (=-=Moore, 1999-=-). In this paper, we propose a kind of Bayesian network in which low-dimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability m... |

81 | Deng K.: Efficient Locally Weighted Polynomial Regression
- Moore, Schneider
- 1997
(Show Context)
Citation Context ...tely, each iteration of the basic algorithm described above is slow, since it requires a entire pass through the dataset. Instead, we use an accelerated EM algorithm inwhich multiresolution d-trees (=-=Moore et al., 1997-=-) are used to dramatically reduce the computational cost of each iteration (Moore, 1999). We refer the interested reader to this previous paper (Moore, 1999) for details. An important remaining issue ... |

75 | The anchors hierarchy: Using the triangle inequality to survive high dimensional data
- Moore
- 2000
(Show Context)
Citation Context ...starts with a small number of Gaussians and stochastically tries adding or deleting Gaussians as the EM algorithm progresses. The details of this algorithm are described in a separate paper (Sand and =-=Moore, 2000-=-). 2.2 HANDLING DISCRETE VARIABLES Suppose now that a set of variables we wish to model includes discrete variables as well as continuous variables. Let be the discrete variables in , and the continuo... |

74 | A General Algorithm for Approximate Inference and Its Application to Hybrid Bayes Nets
- Koller, Lerner, et al.
- 1999
(Show Context)
Citation Context ...cision trees over the discrete variables rather than full lookup tables, a technique previously explored for Bayesian networks over discrete domains (Friedman and Goldszmidt, 1996). In previous work (=-=Koller et al., 1999-=-) employing representations closely related to the those employed in this paper, a combination of these two approaches has been explored briefly in order to represent potentials in junction trees; fur... |

64 | Nonuniform dynamic discretization in hybrid networks
- Kozlov, Koller
- 1997
(Show Context)
Citation Context ... only a model of its variable’s conditional distribution given its parents rather than a joint distribution. Note that there are better ways of discretizing real variables in Bayesian networks (e.g. (=-=Kozlov and Koller, 1997-=-; Monti and Cooper, 1998a)); the simple discretization algorithm discussed here and currently implemented for our experiments is certainly not state-of-the-art. 4.2 DATASETS AND RESULTS We tested the ... |

52 | A view of the em algorithm that justi�es incremental, sparse, and other variants - Neal, Hinton - 1998 |

40 | Scaling em (expectation-maximization) clustering to large databases,” Microsoft Research
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ... within mix-nets as we do for the “competing” Single-Gaussian algorithm described in Section 4.1. Other methods for accelerating EM have also been developed in the past (e.g., (Neal and Hinton, 1998; =-=Bradley et al., 1998-=-)), some of which might be used in our Bayesian network-learning algorithm instead of or in addition to the accelerated EM algorithm employed in this paper. Our current method of handling discrete var... |

28 | Bayesian Network Classification with Continuous Attributes: Getting the Best - Friedman, Goldszmidt, et al. - 1998 |

25 | Learning Bayesian Networks Is Np-complete. Learning from Data, volume 112 - Chickering - 1996 |

24 | Bayesian classi (AutoClass): Theory and results - Cheeseman, Stutz - 1995 |

23 | Models and selection criteria for regression and classification - Heckerman, Meek - 1997 |

21 | Discovering structure in continuous variables using Bayesian networks
- Hofmann, Tresp
- 1996
(Show Context)
Citation Context ...d Meek, 1997b).) Joint distributions formed by convolving a Gaussian kernel function with each of the datapoints have also been conditionalized for use in Bayesian networks over continuous variables (=-=Hofmann and Tresp, 1995-=-). 2.1 HANDLING CONTINUOUS VARIABLES Suppose for the moment that contains only continuous values. What sorts of models might we want to return? One powerful type of model for representing probability ... |

17 | Bayesian network classi ers - Friedman, Geiger, et al. - 1997 |

17 |
A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data
- Monti, Cooper
- 1998
(Show Context)
Citation Context ...able’s conditional distribution given its parents rather than a joint distribution. Note that there are better ways of discretizing real variables in Bayesian networks (e.g. (Kozlov and Koller, 1997; =-=Monti and Cooper, 1998-=-a)); the simple discretization algorithm discussed here and currently implemented for our experiments is certainly not state-of-the-art. 4.2 DATASETS AND RESULTS We tested the previously described alg... |

11 | Learning hybrid Bayesian networks from data
- Monti, Cooper
- 1999
(Show Context)
Citation Context ...able’s conditional distribution given its parents rather than a joint distribution. Note that there are better ways of discretizing real variables in Bayesian networks (e.g. (Kozlov and Koller, 1997; =-=Monti and Cooper, 1998-=-a)); the simple discretization algorithm discussed here and currently implemented for our experiments is certainly not state-of-the-art. 4.2 DATASETS AND RESULTS We tested the previously described alg... |

8 | Embedded Bayesian network classifiers
- Heckeman, Meek
- 1997
(Show Context)
Citation Context ...al distribution of each variable in the network given its parents can be modeled by conditionalizing another “embedded” Bayesian network that specifies the joint between the variable and its parents (=-=Heckerman and Meek, 1997-=-a). (Some theoretical issues concerning the interdependence of parameters in such models appear in (Heckerman and Meek, 1997a) and (Heckerman and Meek, 1997b).) Joint distributions formed by convolvin... |

7 | Bayesian Networks for Lossless Dataset Compression
- Davies, Moore
- 1999
(Show Context)
Citation Context ... network structures restricted to those candidate parents is performed greedily. (We have also previously used a very similar algorithm for learning networks with which to compress discrete datasets (=-=Davies and Moore, 1999-=-).) 4 EXPERIMENTS 4.1 ALGORITHMS We compare the performance of the network-learning algorithm described above to the performance of four other algorithms, each of which is designed to be similar to ou... |

5 |
Inference Using Message Propogation and Topology Transformation in Vector Gaussian Continuous Networks
- Alag
- 1996
(Show Context)
Citation Context ...7, 1998; Sahami, 1996), but with more flexible parameterizations. While it is possible to perform exact inference in some kinds of networks modeling continuous values (e.g. (Driver and Morrell, 1995; =-=Alag, 1996-=-)), general-purpose exact inference in arbitrarily-structured mix-nets with continuous variables may not be possible. However, inference in these networks can be performed via stochastic sampling meth... |

4 |
Implementation of Continous Bayesian Networks Using Sums of Weighted Gaussians
- Driver, Morrell
- 1995
(Show Context)
Citation Context ...tly investigated the use of complex continuous distributions within Bayesian networks; for example, weighted sums of Gaussians have been used to approximate conditional probability density functions (=-=Driver and Morrell, 1995-=-). Such complex distributions over continuous variables are usually quite computationally expensive to learn. This expense may not be too problematic if an appropriate Bayesian network structure is kn... |

4 | The Generalized CEM Algorithm
- Jebara, Pentland
- 1999
(Show Context)
Citation Context ...ities, some of their representational power may be used inefficiently. The EM algorithm has recently been generalized to learn joint distributions specifically optimized for being used conditionally (=-=Jebara and Pentland, 1999-=-). If this modified EM algorithm can be accelerated in a manner similar to our current accelerated EM algorithm, it may result in significantly more accurate networks. Finally, further comparisons wit... |

3 | Embedded Bayesian network classi - Heckerman - 1997 |

2 | Fast Structure Search for Gaussian Mixture Models. Submitted to Knowledge Discovery and Data Mining 2000 - Moore - 2000 |

1 | Bayesian Network Classication with Continuous Attributes: Getting the Best - Friedman, Godszmidt - 1998 |