Results 11  20
of
112
On the Complexity of Partially Observed Markov Decision Processes
 Theoretical Computer Science
, 1996
"... In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reac ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilistic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis,namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NPhard. Our main results concern the case of a fixed bound on the multiplicity of coloring,...
Exploration and Inference in Learning from Reinforcement
, 1997
"... Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling explorati ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling exploration and inference in reinforcement learning. First I distinguish between the well known explorationexploitation tradeoff and what I term exploration for future exploitation. It is argued that there are many tasks where it is more appropriate to maximise this latter measure. In particular it is appropriate when we want to employ learning algorithms as part of the process of designing a controller. Informed by this insight I develop a number of novel measures of the agent's task knowledge. The first of these is a measure of the probability of a particular course of action being the optimal course of action. Estimators are developed for this measure for boolean and nonboolean processes. These...
RiskSensitive Optimal Control Of Hidden Markov Models: Structural Results
"... this paper results of an investigation on the nature and structure of risksensitive controllers for HMM. We pose the following question: How does risksensitivity manifest itself in the structure of a controller? ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
this paper results of an investigation on the nature and structure of risksensitive controllers for HMM. We pose the following question: How does risksensitivity manifest itself in the structure of a controller?
Risk Sensitive Control of Finite State Machines on an Infinite Horizon I
"... In this paper we consider robust and risk sensitive control of discrete time finite state systems on an infinite horizon. The solution of the state feedback robust control problem is characterized in terms of the value of an average cost dynamic game. The risk sensitive stochastic optimal control pr ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
In this paper we consider robust and risk sensitive control of discrete time finite state systems on an infinite horizon. The solution of the state feedback robust control problem is characterized in terms of the value of an average cost dynamic game. The risk sensitive stochastic optimal control problem is solved using the policy iteration algorithm, and the optimal rate is expressed in terms of the value of a stochastic dynamic game with average cost per unit time criterion. By taking a small noise limit a deterministic dynamic game is obtained, which is closely related to the robust control problem. 1 Introduction. There are various approaches to treating disturbances in control systems. In stochastic control, disturbances are modelled as stochastic processes (random noise). On the other hand, in H1 /robust control theory disturbances are modelled deterministically. The theory of risk sensitive optimal control provides a link between stochastic and deterministic approaches. The l...
LargeScale Dynamic Optimization Using Teams of Reinforcement Learning Agents
, 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multiagent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...
A Linear Programming Approach to Solving Stochastic Dynamic Programs
, 1993
"... Recent advances in algorithms for solving large linear programs, specifically constraint generation, motivate new algorithms for solving discrete stochastic dynamic programs. We use a standard optimal growth problem to demonstrate the performance benefits of these new algorithms for solving discrete ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Recent advances in algorithms for solving large linear programs, specifically constraint generation, motivate new algorithms for solving discrete stochastic dynamic programs. We use a standard optimal growth problem to demonstrate the performance benefits of these new algorithms for solving discrete problems and for accurately approximating solutions to continuous problems through discretization. Computational speed over value iteration is substantial. Furthermore, computational speed does not depend on the parameter settings (in particular the degree of discounting). An added benefit of a linear programming solution is the byproduct of shadow prices which we use to generate a discrete grid adaptively. That is, for a fixed number of grid points, our algorithm determines how they should be distributed over the state space to obtain greater accuracy without increasing dimensionality. This work was inspired by conversations with Victor Rios. We thank Chris Telmer and Tony Smith for help...
Optimal Search on a Technology Landscape
, 1998
"... Technological change at the #rmlevel has commonly been modeled as random sampling from a #xed distribution of possibilities. Such models, however, typically ignore empirically important aspects of the #rm's search process,notably the observation that the present state of the #rm guides future innov ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Technological change at the #rmlevel has commonly been modeled as random sampling from a #xed distribution of possibilities. Such models, however, typically ignore empirically important aspects of the #rm's search process,notably the observation that the present state of the #rm guides future innovation. In this paper we explicitly treat this aspect of the #rm's search for technological improvements by introducing a #technology landscape" into an otherwise standard dynamic programming setting where the optimal strategy is to assign a reservation price to each possible technology. Search is modeled as movement,constrained by the cost of innovation, over the technology landscape. Simulations are presented on a stylized technology landscape while analytic results are derived using landscapes that are similar to Markov random #elds. We #nd that early in the search for technological improvements,if the initial position is poor or average,it is optimal to search far away on the technology l...
Analysis of an onoff jamming situation as a dynamic game
 IEEE Trans. Commun
, 2000
"... Abstract—The process of communication jamming can be modeled as a twoperson zerosum noncooperative dynamic game played between a communicator (a transmitterreceiver pair) and a jammer. We consider a oneway timeslotted packet radio communication link in the presence of a jammer, where the data r ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Abstract—The process of communication jamming can be modeled as a twoperson zerosum noncooperative dynamic game played between a communicator (a transmitterreceiver pair) and a jammer. We consider a oneway timeslotted packet radio communication link in the presence of a jammer, where the data rate is fixed and 1) in each slot, the communicator and jammer choose their respective power levels in a random fashion from a zero and a positive value; 2) both players are subject to temporal energy constraints which account for protection of the communicating and jamming transmitters from overheating. The payoff function is the time average of the mean payoff per slot. The game is solved for certain ranges of the players ' transmitter parameters. Structures of steadystate solutions to the game are also investigated. The general behavior of the players ' strategies and payoff increment is found to depend on a parameter related to the payoff matrix, which we call the payoff parameter, and the transmitters ' parameters. When the payoff parameter is lower than a threshold, the optimal steadystate strategies are mixed and the payoff increment constant over time, whereas when it is greater than the threshold, the strategies are pure, and the payoff increment exhibits oscillatory behavior. Index Terms—Communication jamming, grid solution, noncooperative dynamic game, optimal strategies, temporal energy constraints. I.
Optimal taxation in an RBC model: A linearquadratic approach
 Journal of Economic Dynamics and
, 2006
"... We reconsider the optimal taxation of income from labor and capital in the stochastic growth model analyzed by Chari et al. (1994, 1995), but using a linearquadratic (LQ) approximation to derive a loglinear approximation to the optimal policy rules. The example illustrates how inaccurate “naive ” ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We reconsider the optimal taxation of income from labor and capital in the stochastic growth model analyzed by Chari et al. (1994, 1995), but using a linearquadratic (LQ) approximation to derive a loglinear approximation to the optimal policy rules. The example illustrates how inaccurate “naive ” LQ approximation — in which the quadratic objective is obtained from a simple Taylor expansion of the utility function of the representative household — can be, but also shows how a correct LQ approximation can be obtained, which will provide a correct local approximation to the optimal policy rules in the case of small enough shocks. We also consider the numerical accuracy of the LQ approximation in the case of shocks of the size assumed in the calibration of Chari et al. We find that the correct LQ approximation yields results that are quite accurate, and similar in most respects to the results obtained by Chari et al. using a more computationally intensive numerical method.
Optimality of (s, S) Policies in Inventory Models with Markovian Demand
 Operations Research
, 1995
"... This paper is concerned with a generalization of classical inventory models (with fixed ordering costs) that exhibit (s; S) policies. In our model, the distribution of demands in successive periods is dependent on a Markov chain. The model includes the case of cyclic or seasonal demand. The model ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
This paper is concerned with a generalization of classical inventory models (with fixed ordering costs) that exhibit (s; S) policies. In our model, the distribution of demands in successive periods is dependent on a Markov chain. The model includes the case of cyclic or seasonal demand. The model is further extended to incorporate some other realistic features such as no ordering periods and storage and service level constraints. Both finite and infinite horizon nonstationary problems are considered. We show that (s; S) policies are also optimal for the generalized model as well as its extensions. To appear in Operations Research. (DYNAMIC INVENTORY MODEL, MARKOV CHAIN, DYNAMIC PROGRAMMING, FINITE HORIZON, NONSTATIONARY INFINITE HORIZON, CYCLIC DEMAND, (s; S) POLICY) This research was supported in part by NSERC grant A4619 and Canadian Centre for Marketing Information Technologies. The authors would like to thank Vinay Kanetkar, Dmitry Krass, Ernst Presman, Dirk Beyer, four an...