#### DMCA

## Coarticulation: An approach for generating concurrent plans in markov decision processes (2005)

### Cached

### Download Links

- [imls.engr.oregonstate.edu]
- [www.machinelearning.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005 |

Citations: | 6 - 1 self |

### Citations

1235 |
Assessment for learning.
- Sutton
- 1995
(Show Context)
Citation Context ...assume that ∀ai ∈ a, |Dom(ai)| = d. We further assume that the optimal state-action value function Q ∗ associated with the controller C is approximated using linear function approximation techniques (=-=Sutton & Barto, 1998-=-) and admits the following linear additive form: Q ∗ (s, {ai} n i=1) ≈ ¯ Q ∗ (s, {ai} n i=1) = m� Qi(s, ui) i=1 (3) where each Qi(s, ui) is a local function defined over states and a subset of action ... |

294 | Bucket elimination: A unifying framework for probabilistic inference.
- Dechter
- 1996
(Show Context)
Citation Context ...ulation: An Approach for Generating Concurrent Plans O(n d |w| (khlog(h) + h.d)) = O(n k d |w| hlog(h)) for the Algorithm 1. This complexity is logarithmic in h, and exponential in the network width (=-=Dechter, 1999-=-) induced by the structure of the approximate stateaction value function. y wi = ia i Hi . Hi d Dom(a ) = d i h Ti (y 1) i T(y ) i h d tables Ti (y i2) ... Extract top h values Ti (y d) i Figure 2. Vi... |

129 | Algorithms for Reinforcement Learning.
- Szepesvari
- 2010
(Show Context)
Citation Context ...policy that simultaneously commits to the most possible number of subgoals according to their degree of significance. We can think of this problem as a multi-criterion reinforcement learning problem (=-=Gabor et al., 1998-=-), in which the reward signal is a vector whose elements are the rewards associated with the controllers, and a lexicographical ordering of such reward vectors are defined according to the priority or... |

113 | Coordinated Reinforcement Learning. In:
- GUESTRIN, LAGOUDAKIS, et al.
- 2002
(Show Context)
Citation Context ...thm in spirit similar to the variable elimination algorithm in Bayesian networks and efficiently compute Γ h a. ¯ Q ∗ (s, a). Our approach is inspired by the action selection algorithm introduced in (=-=Guestrin et al., 2002-=-) that actually solves the special case for h = 1 (i.e., Γ 1 a), which is the maxa operator. It is also closely related to the problem of finding the h most probable configurations in probabilistic ex... |

77 | An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems.
- Nilsson
- 1998
(Show Context)
Citation Context ...y solves the special case for h = 1 (i.e., Γ 1 a), which is the maxa operator. It is also closely related to the problem of finding the h most probable configurations in probabilistic expert systems (=-=Nilsson, 1998-=-). The general idea is rather than summing all local functions and then performing the Γ h a operator, we perform it over variables one at a time, using only summands that involve the eliminated varia... |

76 | How to dynamically merge markov decision processes.
- Singh, Cohn
- 1998
(Show Context)
Citation Context ...roblem is required. Most of the related work in the context of Markov decision processes assume that the subprocesses modeling the activities are additive utility independent (Boutilier et al., 1997; =-=Singh & Cohn, 1998-=-; Guestrin & Gordon, 2002) and do not address concurrent planning with a set of learned activities modeled as temporally extended actions. In contrast we focus on problems where the overall utility fu... |

41 | Prioritized goal decomposition of Markov decision processes: Toward a synthesis of classical and decision theoretic planning.
- Boutilier, Brafman, et al.
- 1997
(Show Context)
Citation Context ...r the action selection problem is required. Most of the related work in the context of Markov decision processes assume that the subprocesses modeling the activities are additive utility independent (=-=Boutilier et al., 1997-=-; Singh & Cohn, 1998; Guestrin & Gordon, 2002) and do not address concurrent planning with a set of learned activities modeled as temporally extended actions. In contrast we focus on problems where th... |

38 | A hybrid architecture for adaptive robot control.
- Huber
- 2000
(Show Context)
Citation Context ...ority order of the controllers. For specifying the order of priority relation among the controllers we use the expression Cj ⊳ Ci, where the relation “⊳” expresses the subject-to relation (following (=-=Huber, 2000-=-)). This equation should read: controller Cj subject-to subtask Ci. A priority ranking system is then specified by a set of relations {Cj ⊳ Ci}. Without loss of generality we assume that the controlle... |

33 | Distributed planning in hierarchical factored MDPs
- Guestrin, Gordon
(Show Context)
Citation Context ...Most of the related work in the context of Markov decision processes assume that the subprocesses modeling the activities are additive utility independent (Boutilier et al., 1997; Singh & Cohn, 1998; =-=Guestrin & Gordon, 2002-=-) and do not address concurrent planning with a set of learned activities modeled as temporally extended actions. In contrast we focus on problems where the overall utility function may be expressed a... |

9 |
Temporal abstraction in reinforcement learning. Doctoral dissertation
- Precup
- 2000
(Show Context)
Citation Context ...y a set of minimum costto-goal ɛ-redundant controllers ζ = {Ci} n i=1 . Each controller is designed to achieve a subgoal ωi from a set of subgoals Ω = {ωi} n i=1 , and is modeled as a subgoal option (=-=Precup, 2000-=-) defined over an MDP M = 〈S, A, R, P〉. A controller is ɛ-redundant if it admits multiple optimal or ɛ-ascending policies. A policy π is ɛ-ascending if it satisfies the following conditions: 1. Ascend... |

5 | Coarticulation in Markov Decision Processes.
- Rohanimanesh, Platt, et al.
- 2004
(Show Context)
Citation Context ...ignals (e.g., primitive actions in MDPs) for controlling the DOF in the system. In this paper we study an approach for generating concurrent plans based on the coarticulation framework introduced in (=-=Rohanimanesh et al., 2004-=-). We demonstrate how this approach can cope with the curse of dimensionality incurred in systems with excess degrees of freedom, and that it can be viewed as one natural way for generating concurrent... |