Results 1  10
of
16
A Robust Geometric Approach to MultiCriterion Reinforcement Learning
 Journal of Machine Learning Research
, 2004
"... We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observ ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observed but cannot be predicted in advance. We model this situation through a stochastic (Markov) game model, between the learning agent and an arbitrary player, with vectorvalued rewards. State recurrence conditions are imposed throughout. The objective of the learning agent is to have its longterm average reward vector belong to a desired target set. Starting with a given target set, we devise learning algorithms to achieve this task. These algorithms rely on learning algorithms for appropriately defined scalar rewards, together with the geometric insight of the theory of approachability for stochastic games. We then address the more general problem where the target set itself may depend on the model parameters, and hence is not known in advance to the learning agent. A particular case which falls into this framework is that of stochastic games with average reward constraints. Further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.
Learning All Optimal Policies with Multiple Criteria
"... We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear preference assignments over the multiple reward criteria at once. The algorithm can be viewed as an extension to standard rei ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear preference assignments over the multiple reward criteria at once. The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, we back up the set of expected rewards that are maximal for some set of linear preferences (given by a weight vector, − → w). We present the algorithm along with a proof of correctness showing that our solution gives the optimal policy for any linear preference function. The solution reduces to the standard value iteration algorithm for a specific weight vector, − → w. 1.
Risksensitive reinforcement learning applied to control under constraints
 Journal of Artificial Intelligence Research
, 2005
"... In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of find ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some userspecified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1.
RiskAware Decision Making and Dynamic Programming
"... This paper considers sequential decision making problems under uncertainty, the tradeoff between the expected return and the risk of high loss, and methods that use dynamic programming to find optimal policies. It is argued that using Bellman Principle determines how risk considerations on the retur ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper considers sequential decision making problems under uncertainty, the tradeoff between the expected return and the risk of high loss, and methods that use dynamic programming to find optimal policies. It is argued that using Bellman Principle determines how risk considerations on the return can be incorporated. The discussion centers around returns generated by Markov Decision Processes and conclusions concern a large class of methods in Reinforcement Learning. 1
Efficient QoS provisioning for adaptive multimedia in mobile communication networks by reinforcement learning
 Mobile Netw. Appl
, 2006
"... The scarcity and large fluctuations of link bandwidth in wireless networks have motivated the development of adaptive multimedia services in mobile communication networks, where it is possible to increase or decrease the bandwidth of individual ongoing flows. This paper studies the issues of quality ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The scarcity and large fluctuations of link bandwidth in wireless networks have motivated the development of adaptive multimedia services in mobile communication networks, where it is possible to increase or decrease the bandwidth of individual ongoing flows. This paper studies the issues of quality of service (QoS) provisioning in such systems. In particular, call admission control and bandwidth adaptation are formulated as a constrained Markov decision problem. The rapid growth in the number of states and the difficulty in estimating state transition probabilities in practical systems make it very difficult to employ classical methods to find the optimal policy. We present a novel approach that uses a form of discounted reward reinforcement learning known as Qlearning to solve QoS provisioning for wireless adaptive multimedia. Qlearning does not require the explicit state transition model to solve the Markov decision problem; therefore more general and realistic assumptions can be applied to the underlying system model for this approach than in previous schemes. Moreover, the proposed scheme can efficiently handle the large state space and action set of the wireless adaptive multimedia QoS provisioning problem. Handoff dropping probability and average allocated bandwidth are considered as QoS constraints in our model and can be guaranteed simultaneously. Simulation results demonstrate the effectiveness of the proposed scheme in adaptive multimedia mobile communication networks. 1.
Reinforcement Learning for Call Admission Control and Routing under Quality of Service Constraints in Multimedia Networks
, 2000
"... In this paper, we solve the call admission control and routing problem in multimedia networks via reinforcement learning (RL). The problem requires that network revenue be maximized while simultaneously meeting quality of service constraints that forbid entry into certain states and use of certain a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper, we solve the call admission control and routing problem in multimedia networks via reinforcement learning (RL). The problem requires that network revenue be maximized while simultaneously meeting quality of service constraints that forbid entry into certain states and use of certain actions. The problem can be formulated as a constrained semiMarkov decision process. We show that RL provides a solution to this problem and is able to earn significantly higher revenues than alternative heuristics.
Linear fittedq iteration with multiple reward functions
 Journal of Machine Learning Research
"... We present a general and detailed development of an algorithm for finitehorizon fittedQ iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3reward function case using t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a general and detailed development of an algorithm for finitehorizon fittedQ iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidencebased clinical decision support.
DOI 10.1007/s109940085061y Transfer in variablereward hierarchical reinforcement learning
"... Abstract Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from SemiMarkov Decisi ..."
Abstract
 Add to MetaCart
Abstract Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from SemiMarkov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, VariableReward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified realtime strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs. Keywords Hierarchical reinforcement learning · Transfer learning · Averagereward learning · Multicriteria learning
Multiagent QLearning: Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums ∗
"... Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In t ..."
Abstract
 Add to MetaCart
Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In this paper, we present an extended version of Nash QLearning using the Stackelberg equilibrium to address a wider range of games than with the Nash QLearning. We show that mixing the Nash and Stackelberg equilibriums can lead to better rewards not only in static games but also in stochastic games. Moreover, we apply the algorithm to a real world example, the automated vehicle coordination problem.