Results 1  10
of
18
DecisionTheoretic Military Operations Planning
, 2004
"... Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decisiontheoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different cos ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decisiontheoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different costs. The military domain is particularly suited to automated methods because hundreds of tasks, specified by many planning staff, need to be quickly and robustly coordinated. The authors
Importance sampling for reinforcement learning with multiple objectives
, 2001
"... OFTECHNOLOGY hairman, ..."
Risksensitive reinforcement learning applied to control under constraints
 Journal of Artificial Intelligence Research
, 2005
"... In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of find ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some userspecified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1.
A Survey of MultiObjective Sequential DecisionMaking
"... Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmakin ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmaking problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multiobjective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a singleobjective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multiobjective methods according to the applicable scenario, the nature of the scalarization function (which projects multiobjective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multiobjective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. 1.
Safe Exploration for Reinforcement Learning
"... Abstract. In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a stat ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state’s safety degree and that of a backup policy that is able to lead the system under control from a critical state back to a safe one. Moreover, we present a levelbased exploration scheme that is able to generate a comprehensive base of observations while adhering safety constraints.We evaluate our approach on a simplified simulation of a gas turbine. 1
The RobustnessPerformance Tradeoff in Markov Decision Processes
"... Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. I ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal ” tradeoff might be nonMarkovian and hence is in general not tractable. 1
Risksensitive reinforcement learning applied to chance constrained control
 JAIR
, 2005
"... In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of find ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some userspecified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1.
Safe Exploration of State and Action Spaces in Reinforcement Learning
"... In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is wellsuited to domains with complex transition dynamics and highdimensional stateaction spaces, an additional challenge is posed by the need for safe and efficient explor ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is wellsuited to domains with complex transition dynamics and highdimensional stateaction spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and highdimensional stateaction space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the stateaction space. We introduce the PISRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, polebalancing, helicopter hovering, and business management. 1.
On Robustness/Performance Tradeoffs in Linear Programming and Markov Decision Processes
"... Computation of a satisfactory policy for a decision problem when the parameters of the model are uncertain is a problem encountered in many applications. The traditional robust approach is based on a worstcase analysis and may lead to overly conservative solutions. In this paper we directly quantif ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Computation of a satisfactory policy for a decision problem when the parameters of the model are uncertain is a problem encountered in many applications. The traditional robust approach is based on a worstcase analysis and may lead to overly conservative solutions. In this paper we directly quantify the robustness to uncertainty and consider the tradeoff between the nominal performance and robustness measures. Optimization in both linear programming and Markov decision processes is discussed. For linear programming we consider the tradeoff between the nominal cost of a solution and a robustness measure that quantifies the magnitude of constraint violation under the most adversarial parameters. We propose an algorithm that computes the whole set of Pareto efficient solutions based on parametric linear programming. For Markov decision processes, we consider the tradeoff between the performance under nominal parameters and the performance under adversarial parameters. For the special case where only the rewards are uncertain, we propose an algorithm that computes the whole set of Pareto efficient policies in a single pass. Subject classifications: dynamical programming: Markov, finite state; programming: linear, multiple criteria; uncertainty; robustness.
Safe Stochastic Planning: Planning to Avoid Fatal States
"... Abstract. Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety con ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reaching threat states. We introduce a method for finding a value optimal policy satisfying the safety constraint, and report on the validity and effectiveness of our method through a set of experiments. 1