## AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks (2000)

### Cached

### Download Links

- [www.cs.washington.edu]
- [www.cs.cmu.edu]
- [www.kddresearch.org]
- [www.ics.uci.edu]
- [jair.org]
- [www.jair.org]
- [www.ics.uci.edu]
- [www.pitt.edu]
- [www.ics.uci.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 70 - 4 self |

### BibTeX

@ARTICLE{Cheng00ais-bn:an,

author = {Jian Cheng and Marek J. Druzdzel},

title = {AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks},

journal = {Journal of Artificial Intelligence Research},

year = {2000},

volume = {13},

pages = {13--155}

}

### Years of Citing Articles

### OpenURL

### Abstract

Stochastic sampling algorithms, while an attractive alternative to exact algorithms in very large Bayesian network models, have been observed to perform poorly in evidential reasoning with extremely unlikely evidence. To address this problem, we propose an adaptive importance sampling algorithm, AIS-BN, that shows promising convergence rates even under extreme conditions and seems to outperform the existing sampling algorithms consistently. Three sources of this performance improvement are (1) two heuristics for initialization of the importance function that are based on the theoretical properties of importance sampling in nite-dimensional integrals and the structural advantages of Bayesian networks, (2) a smooth learning method for the importance function, and (3) a dynamic weighting function for combining samples from dierent stages of the algorithm. We tested the performance of the AIS-BN algorithm along with two state of the art general purpose sampling algorithms, lik...

### Citations

7052 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...unable to report numerical results here, as the other algorithms almost never achieved the precision reached even by thesrst few iterations of the AIS-BN algorithm. 1. Introduction Bayesian networks (=-=Pearl, 1988-=-) are increasingly popular tools for modeling uncertainty in intelligent systems. With practical models reaching the size of several hundreds of variables (e.g., Pradhan et al., 1994; Conati et al., 1... |

3719 |
Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...group of stochastic sampling methods is formed by so-called Markov Chain Monte Carlo (MCMC) methods that are divided into Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo sampling (Geman & =-=Geman, 1984-=-; Gilks, Richardson, & Spiegelhalter, 1996; MacKay, 1998). Roughly speaking, these methods draw random samples from an unknown target distribution f(X) by biasing the search for this distribution towa... |

1284 | Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems (with Discussion - Lauritzen, Spiegelhalter - 1988 |

580 |
The computational complexity of probabilistic inference using bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ...bility of probabilistic inference. Even though several ingenious exact algorithms have been proposed, in very large models they all stumble on the theoretically demonstrated NP-hardness of inference (=-=Cooper, 199-=-0). The signicance of this result can be observed in practice | exact algorithms applied to large, densely connected practical networks require either a prohibitive amount of memory or a prohibitive a... |

546 | Markov Chain Monte Carlo in practice - Gilks, Richardson, et al. - 1996 |

413 |
Simulation and Monte Carlo Method
- Rubinstein
- 1981
(Show Context)
Citation Context ...(Section 2.1), any technique that can reduce the variance 2 Pr(e) will reduce the variance of c Pr(e) and correspondingly improve the sampling performance. Since the variance of stratied sampling (Ru=-=binstein, 1981-=-) is never much worse than that of random sampling, and can be much better, it can improve the convergence rate. We expect some other variance reduction methods in statistics, such as: (i) the expecte... |

373 | Fusion, Propagation and Structuring in Belief Networks
- Pearl
- 1986
(Show Context)
Citation Context ...timator, this process introduces bias in the estimator. Heuristic importancesrst removes edges from the network until it becomes a polytree, and then uses a modied version of the polytree algorithm (P=-=earl, 1986-=-) to compute the likelihood functions for each of the unobserved nodes. Pr 0 (XnE) is a combination of these likelihood functions with Pr(XnE; e). In Step 7 heuristic importance does not update Pr k (... |

339 |
Increased rates of convergence through learning rate adaptat ion
- Jacobs
- 1988
(Show Context)
Citation Context ...ng, will also improve the sampling performance. Current learning algorithm used a simple approach. Some heuristic learning methods, such as adjusting learning rates according to changes of the error (=-=Jacobs, 1988-=-), should also be applicable to our algorithm. There are several tunable parameters in the AIS-BN algorithm. Finding the optimal values of these parameters for any given network is another interesting... |

324 |
Monte Carlo: Concepts, Algorithms and Applications
- FISHMAN
- 1995
(Show Context)
Citation Context ...(aje) obtained by means of Equation 7 is not an unbiased estimator. However, as the number of samples increases, the bias decreases and can be ignored altogether when the sample size is large enough (=-=Fishman, 1995-=-). Figure 1 presents a generic stochastic sampling algorithm that captures most of the existing sampling algorithms. Without the loss of generality, we restrict ourselves in our description to so-call... |

254 | Approximating Probabilistic Inference in Bayesian Belief Networks - Dagum - 1993 |

156 | Simulation approaches to general probabilistic inference on belief networks - Shachter, Peot - 1990 |

148 | Introduction to Monte Carlo methods
- MacKay
- 1998
(Show Context)
Citation Context ...lled Markov Chain Monte Carlo (MCMC) methods that are divided into Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo sampling (Geman & Geman, 1984; Gilks, Richardson, & Spiegelhalter, 1996; =-=MacKay, 1998-=-). Roughly speaking, these methods draw random samples from an unknown target distribution f(X) by biasing the search for this distribution towards higher probability regions. When applied to Bayesian... |

117 | On-line student modeling for Coached problem solving using Bayesian Networks
- Conati, Gertner, et al.
- 1997
(Show Context)
Citation Context ...rks (Pearl, 1988) are increasingly popular tools for modeling uncertainty in intelligent systems. With practical models reaching the size of several hundreds of variables (e.g., Pradhan et al., 1994; =-=Conati et al., 1997-=-), it becomes increasingly important to address the problem of feasibility of probabilistic inference. Even though several ingenious exact algorithms have been proposed, in very large models they all ... |

115 | Probabilistic diagnosis using a reformulation of the internist-1/qmr knowledge base
- Middleton, Shwe, et al.
- 1991
(Show Context)
Citation Context ...d weighting. 2.4 Practical Performance of the Existing Sampling Algorithms The largest network that has been tested using sampling algorithms is QMR-DT (Quick Medical Reference | Decision Theoretic) (=-=Shwe et al., 199-=-1; Shwe & Cooper, 1991), which contains 534 adult diseases and 4,040sndings, with 40,740 arcs depicting disease-to-nding dependencies. The QMR-DT network belongs to a class of special bipartite networ... |

108 |
Some practical issues in constructing belief networks
- Henrion
- 1987
(Show Context)
Citation Context ...riables, Pr(X). We propose an improvement on this initialization. We know that the eect of evidence nodes on a node will be attenuated as the path length of that node to evidence nodes is increased (H=-=enrion, 198-=-9) and the most aected nodes are the direct ancestors of the evidence nodes. Initializing the ICPT tables of the parents of the evidence nodes to uniform distributions in our experience improves the c... |

100 | Weighing and integrating evidence for stochastic simulation in bayesian networks - Fung, Chang - 1990 |

99 |
Knowledge engineering for large belief networks
- Pradhan, Provan, et al.
- 1994
(Show Context)
Citation Context ...989; Shachter & Peot, 1989) and self-importance sampling (Shachter & Peot, 1989). We used in our tests three large real Bayesian network models available to the scientic community: the CPCS network (P=-=radhan et al., 1994-=-), the PathFinder network (Heckerman, Horvitz, & Nathwani, 1990), and the ANDES network (Conati, Gertner, VanLehn, & Druzdzel, 1997), with evidence as unlikely as 10 41 . While the AIS-BN algorithm al... |

93 |
Evidential reasoning using stochastic simulation of causal models
- Pearl
- 1987
(Show Context)
Citation Context ...speaking, these methods draw random samples from an unknown target distribution f(X) by biasing the search for this distribution towards higher probability regions. When applied to Bayesian networks (=-=Pearl, 1987-=-; Chavez & Cooper, 1990) this approach determines the sampling distribution of a variable from its previous sample given its Markov blanket (Pearl, 1988). This corresponds to updating Pr k (XnE) when ... |

71 |
A generalization of noisy-or model
- Srinivas
- 1993
(Show Context)
Citation Context ...ombinations of the parent nodes. It is possible that conditional probability distributions with causal independence properties, such as Noisy-OR distributions (Pearl, 1988; Henrion, 1989; Diez, 1993; =-=Srinivas, 199-=-3; Heckerman & Breese, 1994), common in very large practical networks, can be treated dierently and lead to considerable savings in the learning time. One direction of testing approximate algorithms, ... |

70 | A new look at causal independence
- HECKERMAN, BREESE
- 1994
(Show Context)
Citation Context ...des. It is possible that conditional probability distributions with causal independence properties, such as Noisy-OR distributions (Pearl, 1988; Henrion, 1989; Diez, 1993; Srinivas, 1993; Heckerman & =-=Breese, 199-=-4), common in very large practical networks, can be treated dierently and lead to considerable savings in the learning time. One direction of testing approximate algorithms, suggested to us by a revie... |

69 | Parameter adjustment in Bayes networks: The generalized noisy orgate
- Diez
- 1993
(Show Context)
Citation Context ... under all combinations of the parent nodes. It is possible that conditional probability distributions with causal independence properties, such as Noisy-OR distributions (Pearl, 1988; Henrion, 1989; =-=Diez, 199-=-3; Srinivas, 1993; Heckerman & Breese, 1994), common in very large practical networks, can be treated dierently and lead to considerable savings in the learning time. One direction of testing approxim... |

48 | An optimal approximation algorithm for Bayesian inference - Dagum, Luby - 1997 |

46 | Backward simulation in Bayesian networks - Fung, Favero - 1994 |

43 |
Neuronale Netze
- Ritter, Martinetz, et al.
- 1991
(Show Context)
Citation Context ...al. (1991) (k) = a b a k=kmax ; (10) where a is the initial learning rate and b is the learning rate in the last step. This function has been reported to perform well in neural network learning (Rit=-=ter et al., 1991-=-). 3.3 Heuristic Initialization in AIS-BN The dimensionality of the problem of Bayesian network inference is equal to the number of variables in a network, which in the networks considered in this pap... |

39 |
Search-based methods to bound diagnostic probabilities in very large belief nets
- Henrion
- 1991
(Show Context)
Citation Context ...seases and 4,040sndings, with 40,740 arcs depicting disease-to-nding dependencies. The QMR-DT network belongs to a class of special bipartite networks and its structure is often referred to as BN2O (H=-=enrion, 1991-=-), because of its two-layer composition: disease nodes in the top layer andsnding nodes in the bottom layer. Shwe and colleagues used an algorithm combining self-importance and heuristic importance an... |

28 | An Empirical Analysis of Likelihood-Weighting Simulation on a Large, Multiply-Connected Belief Network - Shwe, Cooper - 1990 |

26 | A randomized approximation algorithm for probabilistic inference on Bayesian belief networks - Chavez, Cooper - 1990 |

23 | A Monte Carlo algorithm for probabilistic propagation in belief networks based on importance sampling and stratified simulation techniques - Hernández, Moral, et al. - 1998 |

21 | Adaptive importance sampling for estimation in structured domains - Ortiz, Kaelbling - 2000 |

16 | A tutorial introduction to stochastic simulation algorithms for belief networks - Cousins, Chen, et al. - 1993 |

15 |
Toward Normative Expert Systems: The Pathfinder Project
- Heckerman, Horvitz, et al.
- 1992
(Show Context)
Citation Context ...sampling (Shachter & Peot, 1989). We used in our tests three large real Bayesian network models available to the scientific community: the CPCS network (Pradhan et al., 1994), the PathFinder network (=-=Heckerman, Horvitz, & Nathwani, 1990-=-), and the ANDES network (Conati, Gertner, VanLehn, & Druzdzel, 1997), with evidence as unlikely as 10 −41 . While the AIS-BN algorithm always performed better than the other two algorithms, in the ma... |

12 | Computational investigations of low-discrepancy sequences in simulation algorithms for Bayesian networks - Cheng, Druzdzel - 2000 |

11 | Importance sampling algorithms for the propagation of probabilities in belief networks - Cano, Hernandez, et al. - 1996 |

11 |
An algorithm directly finding the k most probable configurations in bayesian networks
- Seroussi, Golmard
- 1994
(Show Context)
Citation Context ...urately enough, we can use this new network to solve other approximate tasks, such as the problem of computing the Maximum A-Posterior assignment (MAP) (Pearl, 1988), finding k most likely scenarios (=-=Seroussi & Golmard, 1994-=-), etc. A large advantage of this approach is that we can solve each of these problems as if the network had no evidence nodes. We know that Markov blanket scoring can improve convergence rates in som... |

7 | Latin hypercube sampling in Bayesian networks - Cheng, Druzdzel - 2000 |

5 |
An optimal algorithm for Monte Carlo estimation (extended abstract
- Dagum, Karp, et al.
- 1995
(Show Context)
Citation Context ...not used in practice as often as the simple likelihood weighting scheme. There are also some other simulation algorithms, such as bounded variance algorithm (Dagum & Luby, 1997) and the AA algorithm (=-=Dagum et al., 1995-=-), which are essentially based on the LW algorithm and the Stopping-Rule Theorem (Dagum et al., 1995). Cano et al. (1996) proposed another importance sampling algorithm that performed somewhat better ... |

2 | Toward normative expert systems I: The path project - Heckerman, Horvitz, et al. - 1991 |

2 | Optimal Monte Carlo inference - Pradhan - 1996 |

1 | Adaptive Importance Sampling in Bayesian Networks Henrion, M - York - 1988 |

1 |
An algorithm directly the K most probable con in Bayesian networks
- Golmard
- 1994
(Show Context)
Citation Context ...nough, we can use this new network to solve other approximate tasks, such as the problem of computing the Maximum A-Posterior assignment (MAP) (Pearl, 1988),snding k most likely scenarios (Seroussi & =-=Golmard, 1994-=-), etc. A large advantage of this approach is that we can solve each of these problems as if the network had no evidence nodes. We know that Markov blanket scoring can improve convergence rates in som... |