Results 1 
7 of
7
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Recursive Utility for Stochastic Trees
 Operations Research
, 1996
"... Stochastic trees are semiMarkov processes represented using tree diagrams. Such trees have been found useful for prescriptive modeling of temporal medical treatment choice. We consider utility functions over stochastic trees which permit recursive evaluation in a graphically intuitive manner analog ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Stochastic trees are semiMarkov processes represented using tree diagrams. Such trees have been found useful for prescriptive modeling of temporal medical treatment choice. We consider utility functions over stochastic trees which permit recursive evaluation in a graphically intuitive manner analogous to decision tree rollback. Such rollback is computationally intractable unless a lowdimensional preference summary exists. We present the most general classes of utility functions having specific tractable preference summaries. We examine three preference summaries  memoryless, Markovian, and semiMarkovian  which promise both computational feasibility and convenience in assessment. Their use is illustrated by application to a previous medical decision analysis of whether to perform carotid endarterectomy. 1 A stochastic tree is a graphical modeling approach which combines useful features from semiMarkov process transition diagrams and decision trees. This paper concerns itself wit...
Controlled Markovprocesses with arbitrary numerical criteria, Theory
 Probab. Appl
, 1982
"... In the theory of controlled Markov processes with discrete time we study, as a rule, controlled processes either with the total reward criterion or with criteria for mean reward per unit time. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In the theory of controlled Markov processes with discrete time we study, as a rule, controlled processes either with the total reward criterion or with criteria for mean reward per unit time.
Axioms and Examples Related to Ordinal Dynamic Programming
, 1981
"... We continue the work of Sobel on axioms for preferences in discrete Markov processes. Sufficient conditions for optimality are presented, and the logical interrelation with previous axiomations is discussed. Axioms and Examples Related to Ordinal Dynamic Programming^ by Charles E. Blair We consider ..."
Abstract
 Add to MetaCart
We continue the work of Sobel on axioms for preferences in discrete Markov processes. Sufficient conditions for optimality are presented, and the logical interrelation with previous axiomations is discussed. Axioms and Examples Related to Ordinal Dynamic Programming^ by Charles E. Blair We consider deterministic sequential Markov process. Let X be a set of states. For each xeX, M(x) C x is the set of states that can be reached in one step from x. Define A to be the set of mappings 6:X»X such that 6(x)eM(x) for every xeX. A policy is an infinite sequence "5,6 ^... where 6.eA. A stationary policy has all 6. equal. For each policyn = ^[^y •• • ^^ ^ each xeX there is a unique sequence X x^x... such that X, = x and x =5(x T),n = l, 2,... We will 12 n n n1 ' ' denote this sequence by P(ir,x). For xeX, $ is defined to be the set of sequences P(tt,x) that arise as it varies over all possible policies.
Discounting axioms imply risk neutrality
"... 1 23Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC. This eoffprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to selfarchive your work, please use the accepted author’s version f ..."
Abstract
 Add to MetaCart
1 23Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC. This eoffprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to selfarchive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication.
under Risk and Uncertainty
"... For choice with deterministic consequences, the standard rationality hypothesis is ordinality — i.e., maximization of a weak preference ordering. For choice under risk (resp. uncertainty), preferences are assumed to be represented by the objectively (resp. subjectively) expected value of a von Neuma ..."
Abstract
 Add to MetaCart
For choice with deterministic consequences, the standard rationality hypothesis is ordinality — i.e., maximization of a weak preference ordering. For choice under risk (resp. uncertainty), preferences are assumed to be represented by the objectively (resp. subjectively) expected value of a von Neumann– Morgenstern utility function. For choice under risk, this implies a key independence axiom; under uncertainty, it implies some version of Savage’s sure thing principle. This chapter investigates the extent to which ordinality, independence, and the sure thing principle can be derived from more fundamental axioms concerning behaviour in decision trees. Following Cubitt (1996), these principles include dynamic consistency, separability, and reduction of sequential choice, which can be derived in turn from one consequentialist hypothesis applied to continuation subtrees as well as entire decision trees. Examples of behavior violating these principles are also reviewed, as are possible explanations of why such violations are often observed in experiments.