Results 1 - 10
of
1,014
Average reward timed games
- In FORMATS: Formal Modeling and Analysis of Timed Systems, Lecture Notes in Computer Science 3829
, 2005
"... Abstract. We consider real-time games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on d ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Abstract. We consider real-time games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played
Hierarchical Average Reward Reinforcement Learning Hierarchical Average Reward Reinforcement Learning
"... Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limit ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior
General discounting versus average reward
- In Proc. 17th International Conf. on Algorithmic Learning Theory (ALT’06), volume 4264 of LNAI
, 2006
"... Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbit ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially
Average reward optimization objective in partially . . .
"... Another example of a controlled system (see Sec. 4.2) Figure 3. The first two plots describe the behavior of the system which rotates the state by a → 10 ◦ and b → −10 ◦ , with the reset states aligned in a way that taking the opposite action brings the system to its topmost point. The x and y in th ..."
Abstract
- Add to MetaCart
in the plots represent possible actions such that x = y. The bottom plot demonstrates how the average reward changes as a function of α and β, where the policies having 2 hidden states (S ∈ {1, 2}) are parametrized as:
Optimizing Average Reward Using Discounted Rewards
"... 1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close ..."
Abstract
- Add to MetaCart
1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close
Average Reward Optimization Objective In Partially Observable Domains
"... We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to average-reward computation is to have a welldefined stationary behavior of a system, so the ..."
Abstract
- Add to MetaCart
We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to average-reward computation is to have a welldefined stationary behavior of a system, so
Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
, 1996
"... This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dyna ..."
Abstract
-
Cited by 130 (13 self)
- Add to MetaCart
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous
Auto-exploratory Average Reward Reinforcement Learning
- Artificial Intelligence
, 1996
"... We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexp ..."
Abstract
-
Cited by 36 (10 self)
- Add to MetaCart
We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores
Hierarchically Optimal Average Reward Reinforcement Learning
- In Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space de ned by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
average-reward HRL algorithms for nding hierarchically optimal policies.
Complexity of probabilistic planning under average rewards
- In Proceedings of the 17th International Joint Conference on Arti Intelligence
, 2001
"... A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly (instead of explicitly by enumerating the state space), for example as precondi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
as precondition-effect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. In many applications, average rewards over unit time is the relevant rationality criterion, as opposed
Results 1 - 10
of
1,014