Results 1 
3 of
3
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Associative Computing Ltd.
"... \Ve cOllf:iider multicriteria f:iequent,ial decision making problems where the vcctor"valucd evaluations arc compared by a given, fixed total ordering. Condit.ions for the opt.irnalit�y of statiOIl<l,r} ' p()lich�s;weI the Bellman optimalit,y equation are given for a. special, but. important cla ..."
Abstract
 Add to MetaCart
\Ve cOllf:iider multicriteria f:iequent,ial decision making problems where the vcctor"valucd evaluations arc compared by a given, fixed total ordering. Condit.ions for the opt.irnalit�y of statiOIl<l,r} ' p()lich�s;weI the Bellman optimalit,y equation are given for a. special, but. important class of problems ''v hell the evaluation of policies can be computed for the criteria, independently of each other. The anal)'sis requires special cafC as t.he t.opolag,Y int.roduced by polnL\visc convergence a.ncl the or<1crtopology introduced by the preference order arc in general incompatible. Reinforcement. learning algorithms are proposed and analY7,ed. Prelimina,ry computer experiments confirm t,he val idit.y of the derived algorithms. These type of multicriteria problem� are most useful,,,,hen t.here are several optimal solutions to a. problem and one 'VllIlt.S to choose the one among these,vhich is optimal according Lo another fixed criLerion. Possible application in robot.ics and repeat.ed ga.mes are outlined.