## Variational probabilistic inference and the QMR-DT database (1999)

Venue: | Journal of Artificial Intelligence Research |

Citations: | 16 - 3 self |

### BibTeX

@ARTICLE{Jaakkola99variationalprobabilistic,

author = {Tommi Jaakkola and Michael I. Jordan},

title = {Variational probabilistic inference and the QMR-DT database},

journal = {Journal of Artificial Intelligence Research},

year = {1999},

volume = {10},

pages = {291--322}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the "Quick Medical Reference" (QMR) database. The QMR database is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method. 1 Introduction Probabilistic models have become increasingly prevalent in AI in recent years. Beyond the significant representational advantages of probability theory, inclu...

### Citations

8526 | Maximum Likelihood from Incomplete Data via the EM Algorithm - Dempster, Laird, et al. - 1977 |

7314 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...alent in AI in recent years. Beyond the significant representational advantages of probability theory, including guarantees of consistency and a naturalness at combining diverse sources of knowledge (=-=Pearl, 1988-=-), the discovery of general exact inference algorithms has been principally responsible for the rapid growth in probabilistic AI (see, e.g., Lauritzen & Spiegelhalter, 1988; Pearl, 1988; Shenoy, 1992)... |

3405 | Convex Analysis - Rockafellar - 1970 |

1322 |
Local computations with probabilities on graphical structures and their applications to expert systems
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...ness at combining diverse sources of knowledge (Pearl, 1988), the discovery of general exact inference algorithms has been principally responsible for the rapid growth in probabilistic AI (see, e.g., =-=Lauritzen & Spiegelhalter, 1988-=-; Pearl, 1988; Shenoy, 1992). These exact inference methods greatly expand the range of models that can be treated within the probabilistic framework and provide a unifying perspective on the general ... |

949 | An introduction to Bayesian network - Jensen - 1995 |

856 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1999
(Show Context)
Citation Context ...ation, such an exponential explosion is inevitable for any calculus that explicitly performs summations over sets of nodes. That is, there are models of interest in which "local" is overly l=-=arge (see Jordan, et al., 1998-=-). From this point of view, it is perhaps not surprising that exact inference is NP-hard (Cooper, 1990). In this paper we discuss the inference problem for a particular large-scale graphical model, th... |

800 |
Sampling-based approaches to calculating marginal densities
- Gelfand, Smith
- 1990
(Show Context)
Citation Context ...olid line) with 12 positive findings treated exactly. An important attraction of sampling methods is the mathematical guarantee of accurate estimates in the limit of a sufficiently large sample size (=-=Gelfand & Smith, 1990-=-). Thus sampling methods have the promise of providing a general methodology for approximate inference, with two caveats: (1) the number of samples that is needed can be difficult to diagnosis, and (2... |

602 |
The computational complexity of probabilistic inference using Bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ...ets of nodes. That is, there are models of interest in which "local" is overly large (see Jordan, et al., 1998). From this point of view, it is perhaps not surprising that exact inference is=-= NP-hard (Cooper, 1990-=-). In this paper we discuss the inference problem for a particular large-scale graphical model, the Quick Medical Reference (QMR) model. 1 The QMR model consists of a combination of statistical and ex... |

292 | Bucket elimination: A unifying framework for probabilistic inference
- Dechter
- 1996
(Show Context)
Citation Context ...order stochastic dependencies. The computational complexity of treating these dependencies exactly can be characterized in terms of the size of the maximal clique of the "moralized" graph (s=-=ee, e.g., Dechter, 1998; Lauritze-=-n & Spiegelhalter, 1988). In particular, the running time is exponential in this measure of size. For the QMR-DT, considering the standardized "clinocopathologic conference" (CPC) cases that... |

162 |
Simulation approaches to general probabilistic inference on belief networks
- Shachter, Peot
- 1989
(Show Context)
Citation Context ...90). Likelihood-weighted sampling is basically a simple forward sampling method that weights samples by their likelihoods. It can be enhanced and improved by utilizing "self-importance sampling&q=-=uot; (see Shachter & Peot, 1990-=-), a version of importance sampling in which the importance sampling distribution is continually updated to reflect the current estimated posterior distribution. Middleton et al. (1990) utilized likel... |

152 | Introduction to Monte Carlo Methods
- MacKay
- 1998
(Show Context)
Citation Context ...imate inference are the stochastic sampling methods. Stochastic sampling is a large family, including techniques such as rejection sampling, importance sampling, and Markov chain Monte Carlo methods (=-=MacKay, 1998-=-). Many of these methods have been applied to the problem of approximate probabilistic inference for graphical models and analytic results are available (Dagum & Horvitz, 1993). In particular, Shwe an... |

114 |
Valuation-based systems for Bayesian decision analysis
- Shenoy
- 1992
(Show Context)
Citation Context ... (Pearl, 1988), the discovery of general exact inference algorithms has been principally responsible for the rapid growth in probabilistic AI (see, e.g., Lauritzen & Spiegelhalter, 1988; Pearl, 1988; =-=Shenoy, 1992-=-). These exact inference methods greatly expand the range of models that can be treated within the probabilistic framework and provide a unifying perspective on the general problem of probabilistic co... |

104 | Weighting and integrating evidence for stochastic simulation in Bayesian networks - Fung, Chag - 1990 |

94 |
Bounded conditioning: Flexible inference for decisions under scarce resources
- Horvitz, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...ed bounds might be used to aid the variational approximation. Similar comments can be made with respect to localized partial evaluation methods and bounded conditioning methods (Draper & Hanks, 1994; =-=Horvitz, et al., 1989-=-). Also, we have seen that variational bounds can be used for assessing whether estimates from Monte Carlo sampling algorithms have converged. A further interesting hybrid would be a scheme in which v... |

50 | A tractable inference algorithm for diagnosing multiple diseases
- Heckerman
- 1989
(Show Context)
Citation Context ...he d j . The positive findings, on the other hand, are more problematic. In the worst case the exact calculation of posterior probabilities is exponentially costly in the number of positive findings (=-=Heckerman, 1989-=-; D'Ambrosio, 1994). Moreover, in practical diagnostic situations the number of positive findings often exceeds the feasible limit for exact calculations. Let us consider the inference calculations in... |

47 |
A Probabilistic Causal Model for Diagnostic Problem
- Peng, Reggia
- 1987
(Show Context)
Citation Context ... 106:5 cutset instantiations. Another approach to approximate inference is provided by "search-based" methods, which consider node instantiations across the entire graph (Cooper, 1984; Henri=-=on, 1991; Peng & Reggia, 1987-=-). The general hope in these methods is that a relatively small fraction of the (exponentially many) node instantiations contains a majority of the probability mass, and that by exploring the high pro... |

45 | Incremental probabilistic inference - D’Ambrosio - 1993 |

44 | Localized Partial Evaluation of Belief Networks
- Draper
- 1995
(Show Context)
Citation Context ...; d j ) = X dnd j P (f + jd)P (d)sX dnd j P (f + jd; q)P (d) j P (f + ; d j jq) (32) Combining these bounds we can obtain interval bounds on the posterior marginal probabilities for the diseases (cf. =-=Draper & Hanks 1994): P (f + ; d j jq) P (f-=- + ; �� d j j��) + P (f + ; d j jq)sP (d j jf + )sP (f + ; d j j��) P (f + ; d j j��) + P (f + ; �� d j jq) ; (33) where �� d j is the binary complement of d j . 5 Experimental... |

43 | Mini-buckets: A general scheme for generating approximations in automated reasoning - Dechter - 1997 |

40 |
Search-based methods to bound diagnostic probabilities in very large belief nets. Uncertainty and
- Henrion
- 1991
(Show Context)
Citation Context ...rge number of 2 106:5 cutset instantiations. Another approach to approximate inference is provided by "search-based" methods, which consider node instantiations across the entire graph (Coop=-=er, 1984; Henrion, 1991-=-; Peng & Reggia, 1987). The general hope in these methods is that a relatively small fraction of the (exponentially many) node instantiations contains a majority of the probability mass, and that by e... |

37 | Probabilistic partial evaluation: Exploiting rule structure in probabilistic inference - Poole - 1997 |

29 | Recursive algorithms for approximating probabilities in graphical models
- Jaakkola, Jordan
- 1997
(Show Context)
Citation Context ...nsiderably more general. Specifically, the methods that we have described here are not limited to the bi-partite graphical structure of the QMR-DT model, nor is it necessary to employ noisy-OR nodes (=-=Jaakkola & Jordan, 1996-=-). It is also the case that the type of transformations that we have exploited in the QMR-DT setting extend to a larger class of dependence relations based on generalized linear models (Jaakkola, 1997... |

29 | An empirical analysis of likelihood—Weighting simulation on a large, multiply connected medical belief network - Shwe, Cooper - 1991 |

23 |
NESTOR: A Computer-Based Medical Diagnostic Aid That Integrates Causal and Probabilistic Knowledge
- Cooper
- 1984
(Show Context)
Citation Context ...nmanageably large number of 2 106:5 cutset instantiations. Another approach to approximate inference is provided by "search-based" methods, which consider node instantiations across the enti=-=re graph (Cooper, 1984-=-; Henrion, 1991; Peng & Reggia, 1987). The general hope in these methods is that a relatively small fraction of the (exponentially many) node instantiations contains a majority of the probability mass... |

20 |
Approximate probabilistic reasoning in bayesian belief networks is np-hard
- Dagum, Luby
- 1993
(Show Context)
Citation Context ... We return to this point in the discussion section, where we consider various promising hybrids of approximate and exact inference algorithms. The general problem of approximate inference is NP-hard (=-=Dagum & Luby, 1993-=-) and this provides additional reason to doubt the existence of a single champion approximate inference technique. We think it important to stress, however, that this hardness result, together with Co... |

20 | Blocking-Gibbs sampling in very large probabilistic expert systems - Jensen, Kong, et al. - 1995 |

17 | A Bayesian analysis of simulation algorithms for inference in belief networks
- Dagum, Horvitz
- 1993
(Show Context)
Citation Context ...Markov chain Monte Carlo methods (MacKay, 1998). Many of these methods have been applied to the problem of approximate probabilistic inference for graphical models and analytic results are available (=-=Dagum & Horvitz, 1993). In part-=-icular, Shwe and Cooper (1991) proposed a stochastic sampling method known as "likelihood-weighted sampling" for the QMR-DT model. Their results are the most promising results to date for in... |

14 |
Quick Medical Reference (QMR) for Diagnostic Assistance
- Miller, Masarie, et al.
- 1986
(Show Context)
Citation Context ...T" that we use in this paper refers to the "decision-theoretic" reformulation of the QMR by Shwe, et al. (1991). Shwe, et al. replaced the heuristic representation employed in the origi=-=nal QMR model (Miller, Fasarie, & Myers, 1986) by -=-a probabilistic representation. factorizations might be found that take advantage of the particular choice made by the QMR-DT. Such a factorization was in fact found by Heckerman (1989); his "Qui... |

5 |
Variational methods for inference and learning in graphical models
- Jaakkola
- 1997
(Show Context)
Citation Context ...ncrease in false positives as a function of true positives (Figure 11), such guarantees may suffice. Finally, for diseases in which the bounds are loose there are also perturbation methods available (=-=Jaakkola, 1997-=-) that can help to validate the approximations for these diseases. 6 Discussion Let us summarize the variational inference method and evaluate the results that we have obtained. The variational method... |

4 |
Symbolic probabilistic inference in large BN20 networks
- D'Ambrosio
- 1994
(Show Context)
Citation Context ...tive findings, on the other hand, are more problematic. In the worst case the exact calculation of posterior probabilities is exponentially costly in the number of positive findings (Heckerman, 1989; =-=D'Ambrosio, 1994-=-). Moreover, in practical diagnostic situations the number of positive findings often exceeds the feasible limit for exact calculations. Let us consider the inference calculations in more detail. To f... |

4 |
Reformulating inference problems through selective conditioning
- Dagum, Horvitz
- 1992
(Show Context)
Citation Context ...ther graphical model architectures, see Jordan, et al. (1998). A promising direction for future research appears to be in the integration of various kinds of approximate and exact methods (see, e.g., =-=Dagum & Horvitz, 1992-=-; Jensen, Kong, & Kjaerulff, 1995). In particular, search-based methods (Cooper, 1984; Peng & Reggia, 1987, Henrion, 1991) and variational methods both yield bounds on probabilities, and, as we have i... |

1 |
Probabilistic diagnosis using a feformulation of the INTERNIST1 /QMR knowledge base II. Evaluation of diagnostic performance. Section on Medical Informatics
- Middleton, Shwe, et al.
- 1990
(Show Context)
Citation Context ...bs sampling (Pearl, 1988). The results from Gibbs sampling were not as good as the results from likelihood-weighted sampling, and we report only the latter results in the remainder of the paper. also =-=Middleton et al., 1990-=-). Consider a set of the N highest ranking disease hypotheses, where the ranking is based on the correct posterior marginals. Corresponding to this set of diseases we can find the smallest set of N 0 ... |