• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,014
Next 10 →

Average reward timed games

by Bo Adler, Luca De Alfaro, Marco Faella - In FORMATS: Formal Modeling and Analysis of Timed Systems, Lecture Notes in Computer Science 3829 , 2005
"... Abstract. We consider real-time games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on d ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
Abstract. We consider real-time games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played

Hierarchical Average Reward Reinforcement Learning Hierarchical Average Reward Reinforcement Learning

by Mohammad Ghavamzadeh, Sridhar Mahadevan, Michael Littman
"... Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limit ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior

General discounting versus average reward

by Marcus Hutter - In Proc. 17th International Conf. on Algorithmic Learning Theory (ALT’06), volume 4264 of LNAI , 2006
"... Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbit ..."
Abstract - Cited by 11 (7 self) - Add to MetaCart
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially

Average reward optimization objective in partially . . .

by n.n.
"... Another example of a controlled system (see Sec. 4.2) Figure 3. The first two plots describe the behavior of the system which rotates the state by a → 10 ◦ and b → −10 ◦ , with the reset states aligned in a way that taking the opposite action brings the system to its topmost point. The x and y in th ..."
Abstract - Add to MetaCart
in the plots represent possible actions such that x = y. The bottom plot demonstrates how the average reward changes as a function of α and β, where the policies having 2 hidden states (S ∈ {1, 2}) are parametrized as:

Optimizing Average Reward Using Discounted Rewards

by unknown authors
"... 1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close ..."
Abstract - Add to MetaCart
1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close

Average Reward Optimization Objective In Partially Observable Domains

by Yuri Grinberg, Doina Precup
"... We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to average-reward computation is to have a welldefined stationary behavior of a system, so the ..."
Abstract - Add to MetaCart
We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to average-reward computation is to have a welldefined stationary behavior of a system, so

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results

by Sridhar Mahadevan , 1996
"... This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dyna ..."
Abstract - Cited by 130 (13 self) - Add to MetaCart
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous

Auto-exploratory Average Reward Reinforcement Learning

by Dokyeong Ok, Prasad Tadepalli - Artificial Intelligence , 1996
"... We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexp ..."
Abstract - Cited by 36 (10 self) - Add to MetaCart
We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores

Hierarchically Optimal Average Reward Reinforcement Learning

by Mohammad Ghavamzadeh, Sridhar Mahadevan - In Proceedings of the Nineteenth International Conference on Machine Learning , 2002
"... Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space de ned by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average ..."
Abstract - Cited by 11 (4 self) - Add to MetaCart
average-reward HRL algorithms for nding hierarchically optimal policies.

Complexity of probabilistic planning under average rewards

by Jussi Rintanen - In Proceedings of the 17th International Joint Conference on Arti Intelligence , 2001
"... A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly (instead of explicitly by enumerating the state space), for example as precondi ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
as precondition-effect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. In many applications, average rewards over unit time is the relevant rationality criterion, as opposed
Next 10 →
Results 1 - 10 of 1,014
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University