## Dependency networks for inference, collaborative filtering, and data visualization (2000)

### Cached

### Download Links

- [research.microsoft.com]
- [www.ai.mit.edu]
- [www.cs.brown.edu]
- [jmlr.csail.mit.edu]
- [www.ai.mit.edu]
- [jamesthornton.com]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 161 - 10 self |

### BibTeX

@ARTICLE{Heckerman00dependencynetworks,

author = {David Heckerman and David Maxwell Chickering and Christopher Meek and Robert Rounthwaite and Carl Kadie and Pack Kaelbling},

title = {Dependency networks for inference, collaborative filtering, and data visualization},

journal = {Journal of Machine Learning Research},

year = {2000},

volume = {1},

pages = {49--75}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a graphical model for probabilistic relationships|an alternative to the Bayesian network|called a dependency network. The graph of a dependency network, unlikeaBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.

### Citations

7440 | Probabilistie Reasoning in Intelligent Systems: Networks of Plausible Inference - Pearl - 1988 |

5290 |
Neural networks for pattern recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ere to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., =-=Bishop, 1995-=-), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression /classification model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for le... |

4018 |
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ... for probabilistic inference in the latter representation---for example, the junction tree algorithm of Jensen, Lauritzen, and Olesen (1990). Alternatively, we can use Gibbs sampling (e.g., Geman and =-=Geman, 1984-=-; Neal, 1993; Besag, Green, Higdon, and Mengersen, 1995; Gilks, Richardson, and Spiegelhalter, 1996), which we examine in some detail. First, let us consider the use of Gibbs sampling for recovering t... |

2106 |
An Introduction to Probability Theory and Its Applications
- Feller
- 1968
(Show Context)
Citation Context ...bbs sampler. The sequence x 1 � x 2 �:::can be viewed as samples drawn from a homogenous Markov chain with transition matrix M having elements M jji = p(x t+1 = jjx t = i). (We use the terminology=-= of Feller, 1957.) It-=- is not di cult to see that M is the product M 1 ::: M n , where M k is the \local" transition matrix describing the resampling of Xk according to the local distribution p(xkjpa k). The positivit... |

1239 |
Spatial Interaction and the Statistical Analysis of Lattice Systems (with Discussion
- Besag
- 1974
(Show Context)
Citation Context ...lities. We have found the latter to be significantly easier to interpret. The proof of Theorem 1 appears in the Appendix, but it is essentially a restatement of the Hammersley-Clifford theorem (e.g., =-=Besag, 1974-=-). This correspondence is no coincidence. As is discussed in Besag (1974), several researchers who developed the Markov-network representation did so by initially investigating a graphical representat... |

1237 | GroupLens: An open architecture for collaborative filtering of netnews - Resnick, Iacovou, et al. - 1994 |

1158 | Empirical analysis of predictive algorithms for collaborative filtering - Breese, Heckerman, et al. - 1998 |

1155 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ... functions, one for each of the c maximal cliques in U , such that joint distribution has the form p(x) = c Y i=1 OE i (x i ); (2) where X i are the variables in clique i, i = 1; : : : ; c (e.g., see =-=Lauritzen, 1996-=-). The following theorem shows that consistent dependency networks and Markov networks have the same representational power. Theorem 1: The set of positive distributions that can be encoded by a consi... |

1103 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ... probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., =-=Platt, 1999-=-), or an embedded regression /classification model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for learning the structure and probabilities of a dependency netwo... |

626 | Markov chain Monte Carlo in practice - Gilks, Richardson, et al. - 1996 |

590 | Probabilistic inference using Markov chain Monte Carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...istic inference in the latter representation---for example, the junction tree algorithm of Jensen, Lauritzen, and Olesen (1990). Alternatively, we can use Gibbs sampling (e.g., Geman and Geman, 1984; =-=Neal, 1993-=-; Besag, Green, Higdon, and Mengersen, 1995; Gilks, Richardson, and Spiegelhalter, 1996), which we examine in some detail. First, let us consider the use of Gibbs sampling for recovering the joint dis... |

285 | Statistical analysis of non-lattice data - Besag - 1975 |

244 | Learning Bayesian networks with local structure - Friedman, Goldszmidt - 1996 |

239 |
Generalized Linear Models, second edition
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ... cation techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., =-=McCullagh and Nelder, 1989-=-), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi cation model (Heckerman and Meek, 1997). This observation suggest... |

198 | Theory of Refinement on Bayesian Networks
- Buntine
- 1991
(Show Context)
Citation Context ...e estimated by any number of probabilistic classification techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., =-=Buntine, 1991-=-), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression /classifi... |

180 | A Bayesian approach to learning Bayesian networks with local structure - Chickering, Heckerman, et al. - 1997 |

153 | Bayesian updating in recursive graphical models by local computation - Jensen, Lauritzen, et al. - 1990 |

144 | Independence properties of directed Markov fields - Lauritzen, Dawid, et al. - 1990 |

123 | Bayesian computation and stochastic systems - Besag, Green, et al. - 1995 |

54 | An Introduction to Stochastic Processes - Bartlett - 1966 |

26 | Nonlinear markov networks for continuous variables - Hofmann, Tresp - 1997 |

24 | Does the wake-sleep algorithm produce good density estimators - Frey, Hinton, et al. - 1995 |

24 | Models and selection criteria for regression and classi - Heckerman, Meek - 1997 |

20 | On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbor systems - Brook - 1964 |

19 |
Independence properties of directed Markov elds
- Lauritzen, Dawid, et al.
- 1990
(Show Context)
Citation Context ...nt ofXn Xi given Pai, i =1�:::�n. Because p is positive and these independencies comprise the global Markov property ofaMarkov network with Ai = Pai, i =1�:::�n, the Hammersley{Cli ord theorem=-= (e.g., Lauritzen, Dawid, Larsen, and Leimer, 1990-=-)) implies that p can be represented by this Markov network. 2 Theorem 4: A minimal consistent dependency network for a positive distribution p(x) must be bi-directional. Proof: Suppose the theorem is... |

18 | Social class, parental encouragement, and educational aspirations - Sewell, Shah - 1968 |

14 | Statistics - McClave, Sincich - 2000 |

12 | Evaluating logistic models for large contingency tables - Fowlkes, Freeny, et al. - 1988 |

7 |
Theory re nement onBayesian networks
- Buntine
- 1991
(Show Context)
Citation Context ... be estimated by anynumber of probabilistic classi cation techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., =-=Buntine, 1991-=-), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi ca... |

5 | Chaînes doubles de Markoff et fonctions alétoires de deux variables - LÉVY - 1948 |

2 | Inference in Markov blanket networks
- Hofmann
- 2000
(Show Context)
Citation Context ...Xn, in this order, and resample each Xi according to p(xijx 1�:::�xi;1�xi+1�:::�xn) =p(xijpa i). We call this procedure an ordered Gibbs sampler. As described by the following theorem (also =-=proved in Hofmann, 2000-=-), this ordered Gibbs sampler de nes a joint distribution for X. Theorem 1: An ordered Gibbs sampler applied to a dependency network for X, where each Xi is discrete and each local distribution p(xijp... |

1 | Markov chain sensitivity by mean first passage times - Cho - 1999 |

1 |
Markov chain sensitivity by mean rst passage times
- Cho, Meyer
- 1999
(Show Context)
Citation Context ...own in Equation 6 are particularly useful. Here, we cite a potentially useful bound of this form given by Cho and Meyer (1999). These authors provide references to many other bounds as well. Theorem (=-=Cho and Meyer, 1999-=-): Let P and ~P = P + E be transition matrices for two homogenous, irreducible k-state Markov chains with respective stationary distributions and ~. Let kEk1 denote the in nity-norm of E, the maximum ... |

1 | Chaines doubles de Marko et fonctions aleatories de deux variables - Levy - 1948 |

1 |
stochastic systems. Statistical Science, 10, 3{66. Bayesian computation and
- Bishop
- 1995
(Show Context)
Citation Context ...ere to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., =-=Bishop, 1995-=-), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi cation model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for lear... |

1 | On the distinction between the conditional probability and the joint probability approaches in the speci cation of nearest-neighbor systems - unknown authors - 1964 |

1 |
Markov chain sensitivity by mean rep
- Cho, Meyer
- 1999
(Show Context)
Citation Context ...own in Equation 6 are particularly useful. Here, we cite a potentially useful bound of this form given by Cho and Meyer (1999). These authors provide references to many other bounds as well. Theorem (=-=Cho and Meyer, 1999-=-): Let P and ~P = P + E be transition matrices for two homogenous, irreducible k-state Markov chains with respective stationary distributions and ~. Let kEk1 denote the in nity-norm of E, the maximum ... |