• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

An analytic solution to discrete Bayesian reinforcement learning. (2006)

Cached

  • Download as a PDF

Download Links

  • [www.cs.toronto.edu]
  • [www.cs.uwaterloo.ca]
  • [paul.rutgers.edu]
  • [www.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.uwaterloo.ca]
  • [www.cs.uwaterloo.ca]
  • [www.robot-learning.de]
  • [www.science.uva.nl]
  • [www.icml2006.org]
  • [www.cs.berkeley.edu]
  • [www.users.dpem.tuc.gr]
  • [www.ias.tu-darmstadt.de]
  • [cs.uwaterloo.ca]
  • [www.ausy.tu-darmstadt.de]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Pascal Poupart , Nikos Vlassis , Jesse Hoey , Kevin Regan
Venue:In ICML.
Citations:139 - 8 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Poupart06ananalytic,
    author = {Pascal Poupart and Nikos Vlassis and Jesse Hoey and Kevin Regan},
    title = {An analytic solution to discrete Bayesian reinforcement learning.},
    booktitle = {In ICML.},
    year = {2006}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Abstract Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient pointbased value iteration algorithm that exploits this simple parameterization.

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University