Results 1 
4 of
4
Constraint Relaxation in Approximate Linear Programs
"... Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two metho ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two methods for improving solution quality. One method rolls out selected constraints of the ALP, guided by the dual information. The second method is a relaxation of the ALP, based on external penalty methods. The latter method is applicable in domains in which rolling out constraints is impractical. Both approaches show promising empirical results for simple benchmark problems as well as for a realistic blood inventory management problem. 1.
Finding acceptable solutions faster using inadmissible information
, 2010
"... Bounded suboptimal search algorithms attempt to find a solution quickly while guaranteeing that the cost does not exceed optimal by more than a desired factor. These algorithms generally use a single admissible heuristic both for guidance and guaranteeing solution quality. We present a new approach ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Bounded suboptimal search algorithms attempt to find a solution quickly while guaranteeing that the cost does not exceed optimal by more than a desired factor. These algorithms generally use a single admissible heuristic both for guidance and guaranteeing solution quality. We present a new approach to bounded suboptimal search that separates these roles, consulting multiple sources of potentially inadmissible information to determine search order and using admissible information to guarantee quality. An empirical evaluation across six benchmark domains shows the new approach has better overall performance. Explicit Estimation Search (EES) The objective of bounded suboptimal search, finding a solution within the bound as quickly as possible, suggests the following search order: for all nodes that appear to be on a path to a solution within the bound, expand the node that seems closest to a goal. EES follows this principle as directly as possible while strictly guaranteeing the bound. To accomplish this EES uses ̂ h, an unbiased estimate of the costtogo, as opposed to h, a lower bound, and ̂ d, an estimate of the number of actions to go. EES relies on two node evaluation functions, f and ̂ f. f(n) = g(n) + h(n) is the traditional cost function of A * (Hart, Nilsson, and Raphael 1968) and provides a lower bound on the cost of an optimal solution through n. ̂ f(n) = g(n)+ ̂ h(n) is an unbiased estimate of the cost of the best solution through n. ̂ h and ̂ d can be supplied by the user, they may be constructed by correctinghanddduring the search (Thayer, Ruml, and Bitton 2008), or they may be constructed using offline techniques before search begins (Samadi, Felner, and Schaeffer 2008). EES expands one of the following nodes: fmin = argminf(n) n∈open best ̂ f = argmin n∈open
Representation Discovery in Sequential Decision Making
"... Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of represen ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of representation discovery problems: finding functional, state, and temporal abstractions. I describe solution techniques varying along several dimensions: diagonalization or dilation methods using approximate or exact transition models; rewardspecific vs rewardinvariant methods; global vs. local representation construction methods; multiscale vs. flat discovery methods; and finally, orthogonal vs. redundant representation discovery methods. I conclude by describing a number of open problems for future work.