## Feature dynamic Bayesian networks (2009)

### Cached

### Download Links

- [www.hutter1.net]
- [eprints.pascal-network.org]
- [arxiv.org]
- [agi-conf.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In AGI |

Citations: | 10 - 7 self |

### BibTeX

@INPROCEEDINGS{Hutter09featuredynamic,

author = {Marcus Hutter},

title = {Feature dynamic Bayesian networks},

booktitle = {In AGI},

year = {2009},

pages = {67--73}

}

### OpenURL

### Abstract

Feature Markov Decision Processes (ΦMDPs) [Hut09] are well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale realworld problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best ” DBN representation. I discuss all building blocks required for a complete general learning algorithm.

### Citations

3773 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...l function is known [RPPCd08], classification and regression is for conditionally independent observations [Bis06], Markov Decision Processes (MDPs) assume that ot and rt only depend on at−1 and ot−1 =-=[SB98]-=-, POMDPs deal with Partially Observable MDPs [KLC98], and Dynamic Bayesian Networks (DBNs) with structured MDPs [BDH99]. Feature MDPs [Hut09]. Concrete real-world problems can often be modeled as MDPs... |

825 | A.R.: Planning and acting in partially observable stochastic domains
- Kaelbling, Littman, et al.
- 1998
(Show Context)
Citation Context ...egression is for conditionally independent observations [Bis06], Markov Decision Processes (MDPs) assume that ot and rt only depend on at−1 and ot−1 [SB98], POMDPs deal with Partially Observable MDPs =-=[KLC98]-=-, and Dynamic Bayesian Networks (DBNs) with structured MDPs [BDH99]. Feature MDPs [Hut09]. Concrete real-world problems can often be modeled as MDPs. For this purpose, a designer extracts relevant fea... |

741 |
Nonlinear Programming. Athena Scientific
- Bertsekas
- 1999
(Show Context)
Citation Context ...ed squared difference, where ρ is the stationary distribution. Even for a fixed policy, value iteration does not converge to the best approximation, but usually converges to a fixed point close to it =-=[BT96]-=-. Value iteration requires ρ explicitly. Since ρ is also too large to store, one has to approximate ρ as well. Another problem, as pointed out in [KP00], is that policy iteration may not converge, sin... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...l and ui are the features selected x. The improved Tree-Augmented naive Bayes (TAN) classifier [FGG97] could be used to model synchronous feature dependencies (i.e. within a time slice). The Chow-Liu =-=[CL68]-=- minimum spanning tree algorithm allows determining G in time O(m3 ). A tree becomes a forest if we employ a lower threshold for the mutual information. Φ search is even harder than structure search, ... |

589 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...lly as in naive Bayes classification [Lew98] with feature selection, where x ′i represents the class label and ui are the features selected x. The improved Tree-Augmented naive Bayes (TAN) classifier =-=[FGG97]-=- could be used to model synchronous feature dependencies (i.e. within a time slice). The Chow-Liu [CL68] minimum spanning tree algorithm allows determining G in time O(m3 ). A tree becomes a forest if... |

457 |
A model for reasoning about persistence and causation
- T, Kanazawa
- 1989
(Show Context)
Citation Context ...ur Φ-optimal ones, is clearly limited to relatively simple tasks. Real-world problems are structured and can often be represented by dynamic Bayesian networks (DBNs) with a reasonable number of nodes =-=[DK89]-=-. Bayesian networks in general and DBNs in particular are powerful tools for modeling and solving complex real-world problems. Advances in theory and increase in computation power constantly broaden t... |

421 | Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- Boutilier, Dean, et al.
- 1999
(Show Context)
Citation Context ...rkov Decision Processes (MDPs) assume that ot and rt only depend on at−1 and ot−1 [SB98], POMDPs deal with Partially Observable MDPs [KLC98], and Dynamic Bayesian Networks (DBNs) with structured MDPs =-=[BDH99]-=-. Feature MDPs [Hut09]. Concrete real-world problems can often be modeled as MDPs. For this purpose, a designer extracts relevant features from the history (e.g. position and velocity of all objects),... |

347 | Naive (Bayes) at forty: The independence assumption in information retrieval
- Lewis
- 1998
(Show Context)
Citation Context ...nspired threshold for binary random variables is 1 2nlogn. Since the mutual information treats parents independently, ˆ T has to be estimated accordingly, essentially as in naive Bayes classification =-=[Lew98]-=- with feature selection, where x ′i represents the class label and ui are the features selected x. The improved Tree-Augmented naive Bayes (TAN) classifier [FGG97] could be used to model synchronous f... |

276 | Reinforcement learning with selective perception and hidden state - McCallum - 1995 |

173 | Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability - Hutter |

142 |
The Minimum Description Length Principle
- Grünwald
- 2007
(Show Context)
Citation Context ...st” Φ. At any time n, the best Φ is the one that minimizes the Markov code length of s1...sn and r1...rn. This reminds but is actually quite different from MDL, which minimizes model+data code length =-=[Grü07]-=-. Dynamic Bayesian networks. The use of “unstructured” MDPs [Hut09], even our Φ-optimal ones, is clearly limited to relatively simple tasks. Real-world problems are structured and can often be represe... |

129 | Efficient Solution Algorithms for Factored MDPs
- Guestrin, Koller, et al.
- 2003
(Show Context)
Citation Context ...policy iteration may not converge, since different policies have different (misleading) stationary distributions. Koller and Parr [KP00] devised algorithms for general factored ρ, and Guestrin et al. =-=[GKPV03]-=- for maxnorm, alleviating this problem. Finally, general policies cannot be stored exactly, and another restriction or approximation is necessary. Model-free learning. Given the difficulties above, I ... |

94 | Computing Factored Value Functions for Policies in Structured MDPs
- Koller, Parr
- 1999
(Show Context)
Citation Context ...since one can always consider a DBN with the union of transition and reward dependencies. Usually it is assumed that the “global” reward is a sum of “local” rewards Ria uix ′i, one for each feature i =-=[KP99]-=-. For simplicity of exposition I assume that the local reward Ri only depends on the feature value x ′i and not on ui and a. Even this is not restrictive and actually may be advantageous as discussed ... |

72 | Policy Iteration for Factored MDPs
- Koller, Parr
- 2000
(Show Context)
Citation Context ... usually converges to a fixed point close to it [BT96]. Value iteration requires ρ explicitly. Since ρ is also too large to store, one has to approximate ρ as well. Another problem, as pointed out in =-=[KP00]-=-, is that policy iteration may not converge, since different policies have different (misleading) stationary distributions. Koller and Parr [KP00] devised algorithms for general factored ρ, and Guestr... |

69 | Online Planning Algorithms for POMDPs
- Ross, Pineau, et al.
- 2008
(Show Context)
Citation Context ...prediction is concerned with environments that do not react to the agents actions (e.g. a weather-forecasting “action”) [Hut03], planning deals with the case where the environmental function is known =-=[RPPCd08]-=-, classification and regression is for conditionally independent observations [Bis06], Markov Decision Processes (MDPs) assume that ot and rt only depend on at−1 and ot−1 [SB98], POMDPs deal with Part... |

68 | Efficient Reinforcement Learning in Factored MDPs
- Kearns, Koller
- 1999
(Show Context)
Citation Context ...omially optimal algorithms (Rmax,E3,OIM) for the explorationexploitation dilemma. For model-based learning, extending E3 to DBNs is straightforward, but E3 needs an oracle for planning in a given DBN =-=[KK99]-=-. Recently, Strehl et al. [SDL07] accomplished the same for Rmax. They even learn the DBN structure, albeit in a very simplistic way. Algorithm OIM [SL08], which I described in [Hut09] for MDPs, can a... |

47 | Efficient structure learning in factoredstate MDPs. AAAI (pp. 645–650
- Strehl, Diuk, et al.
- 2007
(Show Context)
Citation Context ...ch leads to a search space that is pseudo-polynomial in m. Heuristic structure search. We could also replace the well-founded criterion (3) by some heuristic. One such heuristic has been developed in =-=[SDL07]-=-. The mutual information is another popular criterion for determining the dependency of two random variables, so we could add j as a parent of feature i if the mutual information of xj and x ′i is abo... |

42 | Universal Intelligence: A Definition of Machine Intelligence
- Legg, Hutter
- 2007
(Show Context)
Citation Context ... states of an MDP and are assumed to be (approximately) Markov. Artificial General Intelligence (AGI) [GP07] is concerned with designing agents that perform well in a very large range of environments =-=[LH07]-=-, including all of the mentioned ones above and more. In this general situation, it is not a priori clear what the useful features are. Indeed, any observation in the (far) past may be relevant in the... |

24 |
Optimality of universal Bayesian prediction for general loss and alphabet
- Hutter
- 2001
(Show Context)
Citation Context ...gent’s objective is to maximize his reward. Environments. For example, sequence prediction is concerned with environments that do not react to the agents actions (e.g. a weather-forecasting “action”) =-=[Hut03]-=-, planning deals with the case where the environmental function is known [RPPCd08], classification and regression is for conditionally independent observations [Bis06], Markov Decision Processes (MDPs... |

17 | The many faces of optimism: a unifying approach
- Szita, Lorincz
- 2008
(Show Context)
Citation Context ...3 needs an oracle for planning in a given DBN [KK99]. Recently, Strehl et al. [SDL07] accomplished the same for Rmax. They even learn the DBN structure, albeit in a very simplistic way. Algorithm OIM =-=[SL08]-=-, which I described in [Hut09] for MDPs, can also likely be generalized to DBNs, and I can imagine a modelfree version. 7 Incremental Updates As discussed in Section 5, most search algorithms are loca... |

6 | Feature Markov decision processes
- Hutter
- 2009
(Show Context)
Citation Context ...namic Bayesian Networks Marcus Hutter RSISE @ ANU and SML @ NICTA Canberra, ACT, 0200, Australia marcus@hutter1.net www.hutter1.net 24 December 2008 Abstract Feature Markov Decision Processes (ΦMDPs) =-=[Hut09]-=- are well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) ar... |

4 |
Universal search
- Gaglio
(Show Context)
Citation Context ...erating such expressions or programs with an appropriate bias towards simple ones is a universal feature generator that eventually finds the optimal feature map. The idea is known as Universal Search =-=[Gag07]-=-. 6 Value & Policy Learning in ΦDBN Given an estimate ˆ Φ of Φ best , the next step is to determine a good action for our agent. I mainly concentrate on the difficulties one faces in adapting MDP algo... |

2 |
eds.: Artificial General Intelligence
- Goertzel, Pennachin
- 2007
(Show Context)
Citation Context ...gent’s objective is to maximize his reward. Environments. For example, sequence prediction is concerned with environments that do not react to the agents actions (e.g. a weather-forecasting “action”) =-=[Hut03]-=-, planning deals with the case where the environmental function is known [RPPCd08], classification and regression is for conditionally independent observations [Bis06], Markov Decision Processes (MDPs... |