Results 1  10
of
13
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Constrained Discounted Dynamic Programming
 MATH. OF OPERATIONS RESEARCH
, 1996
"... This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, u ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Sippose a
MultiObjective Model Checking of Markov Decision Processes
"... We study and provide efficient algorithms for multiobjective model checking problems for Markov Decision Processes (MDPs). Given an MDP, M, and given multiple lineartime (ωregular or LTL) properties ϕi, and probabilities ri ∈ [0, 1], i = 1,..., k, we ask whether there exists a strategy σ for the ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
We study and provide efficient algorithms for multiobjective model checking problems for Markov Decision Processes (MDPs). Given an MDP, M, and given multiple lineartime (ωregular or LTL) properties ϕi, and probabilities ri ∈ [0, 1], i = 1,..., k, we ask whether there exists a strategy σ for the controller such that, for all i, the probability that a trajectory of M controlled by σ satisfies ϕi is at least ri. We provide an algorithm that decides whether there exists such a strategy and if so produces it, and which runs in time polynomial in the size of the MDP. Such a strategy may require the use of both randomization and memory. We also consider more general multiobjective ωregular queries, which we motivate with an application to assumeguarantee compositional reasoning for probabilistic systems. Note that there can be tradeoffs between different properties: satisfying property ϕ1 with high probability may necessitate satisfying ϕ2 with low probability. Viewing this as a multiobjective optimization problem, we want information about the “tradeoff curve” or Pareto curve for maximizing the probabilities of different properties. We show that one can compute an approximate Pareto curve with respect to a set of ωregular properties in time polynomial in the size of the MDP. Our quantitative upper bounds use LP methods. We also study qualitative multiobjective model checking problems, and we show that these can be analysed by purely graphtheoretic methods, even though the strategies may still require both randomization and memory.
A Robust Geometric Approach to MultiCriterion Reinforcement Learning
 Journal of Machine Learning Research
, 2004
"... We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observ ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observed but cannot be predicted in advance. We model this situation through a stochastic (Markov) game model, between the learning agent and an arbitrary player, with vectorvalued rewards. State recurrence conditions are imposed throughout. The objective of the learning agent is to have its longterm average reward vector belong to a desired target set. Starting with a given target set, we devise learning algorithms to achieve this task. These algorithms rely on learning algorithms for appropriately defined scalar rewards, together with the geometric insight of the theory of approachability for stochastic games. We then address the more general problem where the target set itself may depend on the model parameters, and hence is not known in advance to the learning agent. A particular case which falls into this framework is that of stochastic games with average reward constraints. Further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.
Fast approximation schemes for multicriteria combinatorial Optimization
, 1994
"... The solution to an instance of the standard Shortest Path problem is a single shortest route in a directed graph. Suppose, however, that each arc has both a distance and a cost, and that one would like to find a route that is both short and inexpensive. In general, no single route will be both short ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The solution to an instance of the standard Shortest Path problem is a single shortest route in a directed graph. Suppose, however, that each arc has both a distance and a cost, and that one would like to find a route that is both short and inexpensive. In general, no single route will be both shortest and cheapest; rather, the solution to an instance of this multicriteria problem will be a set of efficient or Pareto optimal routes. The (distance, cost) pairs associated with the efficient routes define an efficient frontier or tradeoff curve. An efficient set for a multicriteria problem can be exponentially large, even when the underlying singlecriterion;oblem is in P. This work therefore considers approximate solutions to rlulticriteria discrete optimization problems and investigates when they can be found quickly. This requires generalizing the notion of a fully polynomial time approximatiofi scheme to multicriteria problems. In this paper, necessary and sufficient conditions are developed for the existence of such a fast approximation scheme for a problem. Although the focus is multicriteria problems, the conditions are of interest even in the single criterion case. In addition, an appropriate form of problem reduction is introduced to facilitate the application of these conditions to a variety of problems. A companion paper uses the results of this paper to study the existence of fast approximation schemes for several interesting network flow, knapsack, and
Dynamic programming approaches to the multiple criteria knapsack problem
 Naval Research Logistics
, 2000
"... Abstract In this paper we study the integer multiple criteria knapsack problem and propose dynamicprogrammingbased approaches to finding all the nondominated solutions. Different and more complex models are discussed including the binary multiple criteria knapsack problem, problems with more than ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract In this paper we study the integer multiple criteria knapsack problem and propose dynamicprogrammingbased approaches to finding all the nondominated solutions. Different and more complex models are discussed including the binary multiple criteria knapsack problem, problems with more than one constraint, and multiperiod as well as timedependent models. 1 Introduction The single criterion knapsack problem is a well known combinatorial optimization problem with a wide range of applications (for an overview see e.g. [20, 21]).
Modellite planning: Diverse multioption plans & dynamic objective functions
 In ICAPS 2007 Workshop on Planning and Plan Execution for Real World Systems
, 2007
"... Knowledge acquisition is one major bottleneck in using planning systems. Modellite planning reduces this burden by placing responsibility on the planning system to cope with partially specified models. For example, eliciting the planning objective can be difficult in applications where it is neces ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Knowledge acquisition is one major bottleneck in using planning systems. Modellite planning reduces this burden by placing responsibility on the planning system to cope with partially specified models. For example, eliciting the planning objective can be difficult in applications where it is necessary to reason about multiple plan metrics, such as cost, time, risk, human life, etc. Traditional approaches, often require a (sometimes subjective) combination of these objectives into a single optimization metric. For example, decision theoretic planners combine plan cost and probability of goal satisfaction into a single reward metric. However, users may not know how to combine their metrics into a single objective without first exploring several diverse plan options. To avoid premature objective function commitments at plan synthesis time (and even plan execution time), we develop the notion of multioption plans. Much like conditional plans that branch to deal with executiontime observations, multioption plans branch to deal with executiontime assessments of plan objectives. That is, a multioption plan is a compact representation of the diverse Pareto set of plans, where at each step the user can execute one of several nondominated options. We formulate multioption planning within the context of conditional probabilistic planning, where plans satisfy the goal with different probabilities and costs. Our approach is based on multiobjective dynamic programming in state space, where each plan node maintains a set of nondominated subplan options, that are each a conditional plan.
Probabilistic planning is multiobjective
, 2007
"... Probabilistic planning is an inherently multiobjective problem where plans must tradeoff probability of goal satisfaction with expected plan cost. To date, probabilistic plan synthesis algorithms have focussed on single objective formulations that bound one of the objectives by making some unnatur ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Probabilistic planning is an inherently multiobjective problem where plans must tradeoff probability of goal satisfaction with expected plan cost. To date, probabilistic plan synthesis algorithms have focussed on single objective formulations that bound one of the objectives by making some unnatural assumptions. We show that a multiobjective formulation is not only needed, but also enables us to (i) generate Pareto sets of plans, (ii) use recent advances in probabilistic planning reachability heuristics, and (iii) elegantly solve limited contingency planning problems. We extend LAO ∗ to its multiobjective counterpart MOLAO ∗ , and discuss a number of speedup techniques that form the basis for a state of the art conditional probabilistic planner. 1
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
ModuleBased Reinforcement Learning: Experiments with a Real Robot
"... The behavior of reinforcement learning (RL) algorithms is best understood in com ..."
Abstract
 Add to MetaCart
The behavior of reinforcement learning (RL) algorithms is best understood in com