## Dependency Networks for Inference, Collaborative Filtering, and Data Visualization (2000)

### Cached

### Download Links

- [research.microsoft.com]
- [www.ai.mit.edu]
- [www.cs.brown.edu]
- [jmlr.csail.mit.edu]
- [www.ai.mit.edu]
- [jamesthornton.com]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 157 - 10 self |

### BibTeX

@ARTICLE{Heckerman00dependencynetworks,

author = {David Heckerman and David Maxwell Chickering and Christopher Meek and Robert Rounthwaite and Carl Kadie},

title = {Dependency Networks for Inference, Collaborative Filtering, and Data Visualization},

journal = {Journal of Machine Learning Research},

year = {2000},

volume = {1},

pages = {49--75}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a graphical representation of probabilistic relationships---an alternativeto the Bayesian network---called a dependency network. LikeaBayesian network, a dependency network has a graph and a probability component. The graph componentis a (cyclic) directed graph such that a node's parents render that node independentof all other nodes in the network. The probability component consists of the probability of a node given its parents for each node (as in a Bayesian network). Weidentify several basic properties of this representation, and describe its use in probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of predictive relationships. Keywords: Dependency networks, graphical models, probabilistic inference, density estimation, inference, data visualization, exploratory data analysis, collaborative filtering, Gibbs sampling 1 Introduction The Bayesian network has proven to be a valuable tool for encoding, lear...

### Citations

7108 | Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference - Pearl - 1988 |

4885 |
Neural networks for pattern recognition
- Bishop
- 2000
(Show Context)
Citation Context ...ere to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., =-=Bishop, 1995-=-), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression /classification model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for le... |

3767 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ... for probabilistic inference in the latter representation---for example, the junction tree algorithm of Jensen, Lauritzen, and Olesen (1990). Alternatively, we can use Gibbs sampling (e.g., Geman and =-=Geman, 1984-=-; Neal, 1993; Besag, Green, Higdon, and Mengersen, 1995; Gilks, Richardson, and Spiegelhalter, 1996), which we examine in some detail. First, let us consider the use of Gibbs sampling for recovering t... |

1892 |
An Introduction to Probability Theory and Its Applications
- Feller
- 1971
(Show Context)
Citation Context ...bbs sampler. The sequence x 1 � x 2 �:::can be viewed as samples drawn from a homogenous Markov chain with transition matrix M having elements M jji = p(x t+1 = jjx t = i). (We use the terminology=-= of Feller, 1957.) It-=- is not di cult to see that M is the product M 1 ::: M n , where M k is the \local" transition matrix describing the resampling of Xk according to the local distribution p(xkjpa k). The positivit... |

1152 |
Spatial interaction and the statistical analysis of lattice systems
- Besag
- 1974
(Show Context)
Citation Context ...lities. We have found the latter to be significantly easier to interpret. The proof of Theorem 1 appears in the Appendix, but it is essentially a restatement of the Hammersley-Clifford theorem (e.g., =-=Besag, 1974-=-). This correspondence is no coincidence. As is discussed in Besag (1974), several researchers who developed the Markov-network representation did so by initially investigating a graphical representat... |

1136 | GroupLens: An Open Architecture for Collaborative Filtering for Netnews - Resnick, Suchak, et al. - 1994 |

1105 |
Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ... functions, one for each of the c maximal cliques in U , such that joint distribution has the form p(x) = c Y i=1 OE i (x i ); (2) where X i are the variables in clique i, i = 1; : : : ; c (e.g., see =-=Lauritzen, 1996-=-). The following theorem shows that consistent dependency networks and Markov networks have the same representational power. Theorem 1: The set of positive distributions that can be encoded by a consi... |

1042 | Empirical Analysis of Predictive Algorithms for Collaborative Filtering - Breese, Heckerman, et al. - 1998 |

1019 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ... probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., =-=Platt, 1999-=-), or an embedded regression /classification model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for learning the structure and probabilities of a dependency netwo... |

569 | Probabilistic inference using Markov chain Monte Carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...istic inference in the latter representation---for example, the junction tree algorithm of Jensen, Lauritzen, and Olesen (1990). Alternatively, we can use Gibbs sampling (e.g., Geman and Geman, 1984; =-=Neal, 1993-=-; Besag, Green, Higdon, and Mengersen, 1995; Gilks, Richardson, and Spiegelhalter, 1996), which we examine in some detail. First, let us consider the use of Gibbs sampling for recovering the joint dis... |

558 | Markov chain Monte Carlo in practice - Gilks, Richardson, et al. - 1996 |

273 | Statistical analysis of non-lattice data - Besag - 1975 |

234 | Learning Bayesian Networks with Local Structure - Friedman, Goldszmidt - 1996 |

208 |
Generalized Linear Models, Second Edition
- McCullagh, Nedler
- 1989
(Show Context)
Citation Context ... cation techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., =-=McCullagh and Nelder, 1989-=-), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi cation model (Heckerman and Meek, 1997). This observation suggest... |

183 | Theory refinement on Bayesian networks
- Buntine
(Show Context)
Citation Context ...e estimated by any number of probabilistic classification techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., =-=Buntine, 1991-=-), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression /classifi... |

167 | C.A Bayesian Approach to Learning Bayesian Networks with Local Structure - Chickering, Heckerman, et al. - 1997 |

145 | Bayesian updating in recursive graphical models by local computations - Jensen, Lauritzen, et al. - 1990 |

141 | Independence properties of directed Markov fields. Networks - Lauritzen, Dawid, et al. - 1990 |

116 | Bayesian computation and stochastic systems - Besag, Green, et al. - 1995 |

44 | An Introduction to Stochastic Processes - Bartlett - 1978 |

26 | Nonlinear markov networks for continuous variables - Hofmann, Tresp - 1997 |

23 | Does the wake-sleep algorithm produce good density estimators - Frey, Hinton, et al. - 1996 |

23 | Models and selection criteria for regression and classi - Heckerman, Meek - 1997 |

19 |
Independence properties of directed Markov elds
- Lauritzen, Dawid, et al.
- 1990
(Show Context)
Citation Context ...nt ofXn Xi given Pai, i =1�:::�n. Because p is positive and these independencies comprise the global Markov property ofaMarkov network with Ai = Pai, i =1�:::�n, the Hammersley{Cli ord theorem=-= (e.g., Lauritzen, Dawid, Larsen, and Leimer, 1990-=-)) implies that p can be represented by this Markov network. 2 Theorem 4: A minimal consistent dependency network for a positive distribution p(x) must be bi-directional. Proof: Suppose the theorem is... |

17 | On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbor systems - Brook - 1964 |

16 | Social class, parental encouragement, and educational aspirations - Sewell, Shah - 1978 |

11 | J.M.: Evaluating Logistic Models for Large Contingency Tables - Fowlkes, Freeny, et al. - 1989 |

11 | Statistics - McClave, Dietrich, et al. - 1997 |

7 |
Theory re nement onBayesian networks
- Buntine
- 1991
(Show Context)
Citation Context ... be estimated by anynumber of probabilistic classi cation techniques (or regression techniques, if we were to consider continuous variables) such as methods using a probabilistic decision tree (e.g., =-=Buntine, 1991-=-), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., Bishop, 1995), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi ca... |

5 | Chaînes doubles de Markoff et fonctions alétoires de deux variables - LÉVY - 1948 |

2 | Inference in Markov blanket networks
- Hofmann
- 2000
(Show Context)
Citation Context ...Xn, in this order, and resample each Xi according to p(xijx 1�:::�xi;1�xi+1�:::�xn) =p(xijpa i). We call this procedure an ordered Gibbs sampler. As described by the following theorem (also =-=proved in Hofmann, 2000-=-), this ordered Gibbs sampler de nes a joint distribution for X. Theorem 1: An ordered Gibbs sampler applied to a dependency network for X, where each Xi is discrete and each local distribution p(xijp... |

1 | Markov chain sensitivity by mean first passage times - Cho - 1999 |

1 |
Markov chain sensitivity by mean rst passage times
- Cho, Meyer
- 1999
(Show Context)
Citation Context ...own in Equation 6 are particularly useful. Here, we cite a potentially useful bound of this form given by Cho and Meyer (1999). These authors provide references to many other bounds as well. Theorem (=-=Cho and Meyer, 1999-=-): Let P and ~P = P + E be transition matrices for two homogenous, irreducible k-state Markov chains with respective stationary distributions and ~. Let kEk1 denote the in nity-norm of E, the maximum ... |

1 | Chaines doubles de Marko et fonctions aleatories de deux variables - Levy - 1948 |

1 |
stochastic systems. Statistical Science, 10, 3{66. Bayesian computation and
- Bishop
- 1995
(Show Context)
Citation Context ...ere to consider continuous variables) such as methods using a probabilistic decision tree (e.g., Buntine, 1991), a generalized linear model (e.g., McCullagh and Nelder, 1989), a neural network (e.g., =-=Bishop, 1995-=-), a probabilistic support-vector machine (e.g., Platt, 1999), or an embedded regression/classi cation model (Heckerman and Meek, 1997). This observation suggests a simple, heuristic approach for lear... |

1 | On the distinction between the conditional probability and the joint probability approaches in the speci cation of nearest-neighbor systems - unknown authors - 1964 |

1 |
Markov chain sensitivity by mean rep
- Cho, Meyer
- 1999
(Show Context)
Citation Context ...own in Equation 6 are particularly useful. Here, we cite a potentially useful bound of this form given by Cho and Meyer (1999). These authors provide references to many other bounds as well. Theorem (=-=Cho and Meyer, 1999-=-): Let P and ~P = P + E be transition matrices for two homogenous, irreducible k-state Markov chains with respective stationary distributions and ~. Let kEk1 denote the in nity-norm of E, the maximum ... |