## Selecting Strategies for Infinite-Horizon Dynamic LIMIDs

Citations: | 2 - 1 self |

### BibTeX

@MISC{Gerven_selectingstrategies,

author = {Marcel A. J. Van Gerven and Francisco J. Díez},

title = {Selecting Strategies for Infinite-Horizon Dynamic LIMIDs},

year = {}

}

### OpenURL

### Abstract

In previous work we have introduced dynamic limited-memory influence diagrams (DLIM-IDs) as an extension of LIMIDs aimed at representing infinite-horizon decision processes. If a DLIMID respects the first-order Markov assumption then it can be represented by 2TLIMIDS. Given that the treatment selection algorithm for LIMIDs, called single policy updating (SPU), can be infeasible even for small finite-horizon models, we propose two alternative algorithms for treatment selection with 2TLIMIDS. First, single rule updating (SRU) is a hill-climbing method inspired upon SPU which needs not iterate exhaustively over all possible policies at each decision node. Second, a simulated annealing algorithm can be used to avoid the local-maximum policies found by SPU and SRU. 1

### Citations

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...he optimal strategy for infinite-horizon dynamic LIMIDs. We demonstrate the performance of these algorithms on a non-trivial decision problem. 2 Preliminaries 2.1 Bayesian Networks Bayesian networks (=-=Pearl, 1988-=-) provide for a compact factorization of a joint probability dis-tribution over a set of random variables by exploiting the notion of conditional independence. One way to represent conditional indepe... |

670 |
Probabilistic Networks and Expert Systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...ction U is additively decomposable such that U = ∑ U∈U U. P specifies for each d ∈ ΩD a distribution P(C : d) = ∏ C∈C P(C | π(C)) that represents the distribution over C when we externally set D = d (=-=Cowell et al., 1999-=-). Hence, C is not conditioned on D, but rather parameterized by D, and if D is unbound then we write P(C : D). A stochastic policy for decisions D ∈ D is defined as a distribution PD(D | π(D)) that m... |

485 |
A model for reasoning about persistence and causation
- Dean, Kanazawa
- 1989
(Show Context)
Citation Context ...f the observational history (Meuleau et al., 1999). We proceed by converting (L0, Lt) into (B0, Bt) with B0 = B(L0,∆ 0 ) and Bt = B(Lt,∆ t ), where (B0, Bt) is known as a twostage temporal Bayes net (=-=Dean and Kanazawa, 1989-=-). We use inference algorithms that operate on (B0, Bt) in order to compute an approximation of the expected utility. In our work, we have used the interface algorithm (Murphy, 2002), for which it hol... |

376 |
Influence diagrams
- Howard, Matheson
- 1984
(Show Context)
Citation Context ... maximize life-expectancy. Limited-memory influence diagrams (LIMIDs) are a formalism for decision-making under uncertainty (Lauritzen and Nilsson, 2001). They generalize standard influence diagrams (=-=Howard and Matheson, 1984-=-) by relaxing the assumption that the whole observed history is taken into account when making a decision, and by dropping the requirement that a complete order is defined over decisions. This increas... |

283 |
Stochastic Dynamic Programming
- Ross
- 1983
(Show Context)
Citation Context ...or until euMax = euMaxOld return ∆ Figure 3: Single policy updating for 2TLIMIDs. optimal strategy is deterministic and stationary for infinite-horizon and fully observable Markov decision processes (=-=Ross, 1983-=-). In the partially observable case, we can only expect to find approximations to the optimal strategy by using memory variables that represent part of the observational history (Meuleau et al., 1999)... |

76 |
A method for using belief networks as influence diagrams
- Cooper
- 1988
(Show Context)
Citation Context ...random variables X ∈ X with parents πG(D) such that P(X | πG ′(X)) = PD(D | πG(D)). Additionally, utility functions U ∈ U may be converted into random variables X by means of Cooper’s transformation (=-=Cooper, 1988-=-), which allows us to compute E∆(U). We use B(L,∆) to denote this conversion of a LIMID into a Bayesian network. Single policy updating cannot be applied directly to an infinite-horizon DLIMID since c... |

58 | Solving POMDPs by searching the space of finite policies
- Meuleau, Kim, et al.
- 1999
(Show Context)
Citation Context ...processes (Ross, 1983). In the partially observable case, we can only expect to find approximations to the optimal strategy by using memory variables that represent part of the observational history (=-=Meuleau et al., 1999-=-). We proceed by converting (L0, Lt) into (B0, Bt) with B0 = B(L0,∆ 0 ) and Bt = B(Lt,∆ t ), where (B0, Bt) is known as a twostage temporal Bayes net (Dean and Kanazawa, 1989). We use inference algori... |

48 | Representing and solving decision problems with limited information
- Lauritzen, Nilsson
(Show Context)
Citation Context ...-support system that chooses treatments based on patient status in order to maximize life-expectancy. Limited-memory influence diagrams (LIMIDs) are a formalism for decision-making under uncertainty (=-=Lauritzen and Nilsson, 2001-=-). They generalize standard influence diagrams (Howard and Matheson, 1984) by relaxing the assumption that the whole observed history is taken into account when making a decision, and by dropping the ... |

32 | Dynamic Bayesian Networks
- Murphy
- 2002
(Show Context)
Citation Context ...es net (Dean and Kanazawa, 1989). We use inference algorithms that operate on (B0, Bt) in order to compute an approximation of the expected utility. In our work, we have used the interface algorithm (=-=Murphy, 2002-=-), for which it holds that the space and time taken to compute each P(X t | X t−1 ) does not depend on the number of time-slices. The approximation Eǫ ∆ (U) is made using a finite number of time-slice... |

1 | Prognosis of high-grade carcinoid tumor patients using dynamic limited-memory influence diagrams - Gerven, Díez, et al. - 2006 |