Results 1 - 10
of
25
Reinforcement Learning with Bounded Risk
- In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... In this paper, we consider nite MDPs with fatal states. We dene the risk under a policy as the probability of entering a fatal state, which is dierent to the notion of risk normally used in DP and RL (most often regarding the variance of the return). We consider the problem of nding optimal po ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this paper, we consider nite MDPs with fatal states. We dene the risk under a policy as the probability of entering a fatal state, which is dierent to the notion of risk normally used in DP and RL (most often regarding the variance of the return). We consider the problem of nding optimal policies with bounded risk, i.e. where the risk is smaller than some user specied threshold !, and formalize it as a constrained MDP with two innite horizon criteria { a discounted one for the value of a state and an undiscounted criterion for the risk. We dene a heuristic, model free reinforcement learning algorithm that nds good deterministic policies for the constrained problem. The algorithm is based on an abstract ordering of the multi-dimensional return space. It uses a weighted formulation of the problem. The internal weight parameter is adjusted by an heuristic optimization algorithm. 1.
Multi-criteria Reinforcement Learning
, 1998
"... We consider multi-criteria sequential decision making problems where the vector-valued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We consider multi-criteria sequential decision making problems where the vector-valued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the order-topology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the medium-term multicriteria RL often converges to better solutions (measured by the first criterion) than their single-criterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
On the response of EMT-based control to interacting targets and models
- In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-06
, 2006
"... A novel control mechanism was recently introduced based on Extended Markov Tracking (EMT) [9, 10]. In this paper, we present a study of its response to multiple interacting control goals. We show a simple extension that can be integrated into EMT-based control, and which provides it with the ability ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
A novel control mechanism was recently introduced based on Extended Markov Tracking (EMT) [9, 10]. In this paper, we present a study of its response to multiple interacting control goals. We show a simple extension that can be integrated into EMT-based control, and which provides it with the ability to handle several behavioral targets. Experimental support for the validity of this extension is provided. We also describe an experiment with a simulated robot, where EMT-based controllers interact and interfere indirectly via the environment. Experiments support the resilience of multiagent EMT-based team control to potential conflicts that may appear within a team. 1.
A Robust Geometric Approach to Multi-Criterion Reinforcement Learning
- Journal of Machine Learning Research
, 2004
"... We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observ ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is defined in terms of multiple reward functions of the average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, which are observed but cannot be predicted in advance. We model this situation through a stochastic (Markov) game model, between the learning agent and an arbitrary player, with vector-valued rewards. State recurrence conditions are imposed throughout. The objective of the learning agent is to have its long-term average reward vector belong to a desired target set. Starting with a given target set, we devise learning algorithms to achieve this task. These algorithms rely on learning algorithms for appropriately defined scalar rewards, together with the geometric insight of the theory of approachability for stochastic games. We then address the more general problem where the target set itself may depend on the model parameters, and hence is not known in advance to the learning agent. A particular case which falls into this framework is that of stochastic games with average reward constraints. Further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.
Dynamic Preferences in Multi-Criteria Reinforcement Learning
- In Proceedings of ICML-05
, 2005
"... The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent’s preferences between different objectives may vary with time. In this paper, ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent’s preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan’s ass problem and network routing. 1.
Coarticulation: An approach for generating concurrent plans in markov decision processes
- In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005
, 2005
"... We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natur ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natural way for generating concurrency in the system is by coarticulating among the set of learned activities available to the agent. In general due to the multiple DOF in the system, often there exists a redundant set of admissible sub-optimal policies associated with each learned activity. Such flexibility enables the agent to concurrently commit to several subgoals according to their priority levels, given a new task defined in terms of a set of prioritized subgoals. We present efficient approximate algorithms for computing such policies and for generating concurrent plans. We also evaluate our approach in a simulated domain. 1.
Switch Packet Arbitration via Queue-Learning
- In Proc. NIPS-14
, 2001
"... In packet switches, packets queue at switch inputs and contend for outputs. ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In packet switches, packets queue at switch inputs and contend for outputs.
Risk-sensitive reinforcement learning applied to chance constrained control
- JAIR
, 2005
"... In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of find ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1.
Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies
"... Abstract—Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorith ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorithms can only deal with a scalar cost function. Over the last decade, efforts on solving machine learning problems using the Pareto-based multiobjective optimization methodology have gained increasing impetus, particularly due to the great success of multiobjective optimization using evolutionary algorithms and other population-based stochastic search methods. It has been shown that Pareto-based multiobjective learning approaches are more powerful compared to learning algorithms with a scalar cost function in addressing various topics of machine learning, such as clustering, feature selection, improvement of generalization ability, knowledge extraction, and ensemble generation. One common benefit of the different multiobjective learning approaches is that a deeper insight into the learning problem can be gained by analyzing the Pareto front composed of multiple Pareto-optimal solutions. This paper provides an overview of the existing research on multiobjective machine learning, focusing on supervised learning. In addition, a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning, e.g., how to identify interpretable models and models that can generalize on unseen data from the obtained Pareto-optimal solutions. Three approaches to Pareto-based multiobjective ensemble generation are compared and discussed in detail. Finally, potentially interesting topics in multiobjective machine learning are suggested. Index Terms—Ensemble, evolutionary multiobjective optimization, generalization, machine learning, multiobjective learning, multiobjective optimization, neural networks, Pareto optimization. I.

