Results 1  10
of
11
MinMax Approximate Dynamic Programming
"... Abstract — In this paper we describe an approximate dynamic programming policy for a discretetime dynamical system perturbed by noise. The approximate value function is the pointwise supremum of a family of lower bounds on the value function of the stochastic control problem; evaluating the control ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper we describe an approximate dynamic programming policy for a discretetime dynamical system perturbed by noise. The approximate value function is the pointwise supremum of a family of lower bounds on the value function of the stochastic control problem; evaluating the control policy involves the solution of a minmax or saddlepoint problem. For a quadratically constrained linear quadratic control problem, evaluating the policy amounts to solving a semidefinite program at each time step. By evaluating the policy, we obtain a lower bound on the value function, which can be used to evaluate performance: When the lower bound and the achieved performance of the policy are close, we can conclude that the policy is nearly optimal. We describe several numerical examples where this is indeed the case. I.
Approximate dynamic programming via sum of squares programming
 In Proc. of the European Control Conference
, 2013
"... Abstract — We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision variables. By relaxing the Bellman equation to an ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision variables. By relaxing the Bellman equation to an inequality, one obtains a linear program in the basis coefficients with an infinite set of constraints. We show that a recently introduced method, which obtains convex quadratic value function approximations, can be extended to higher order polynomial approximations via sum of squares programming techniques. An approximate value function can then be computed offline by solving a semidefinite program, without having to sample the infinite constraint. The policy is evaluated online by solving a polynomial optimization problem, which also turns out to be convex in some cases. We experimentally validate the method on an autonomous helicopter testbed using a 10dimensional helicopter model. I.
Iterated Approximate Value Functions
"... Abstract — In this paper we introduce a control policy which we refer to as the iterated approximate value function policy. The generation of this policy requires two stages, the first one carried out offline, and the second stage carried out online. In the first stage we simultaneously compute a ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper we introduce a control policy which we refer to as the iterated approximate value function policy. The generation of this policy requires two stages, the first one carried out offline, and the second stage carried out online. In the first stage we simultaneously compute a trajectory of moments of the state and action and a sequence of approximate value functions optimized to that trajectory. The next stage is to perform control using the generated sequence of approximate value functions. This yields a timevarying policy, even in the case where the optimal policy is timeinvariant. We restrict our attention to the case with linear dynamics and quadratically representable stage cost function. In this case the precomputation stage requires the solution of a semidefinite program (SDP). Finding the control action at each timeperiod requires solving a small convex optimization problem which can be carried out quickly. We conclude with some examples. I.
Approximate Dynamic Programming for Commodity and Energy Merchant Operations
, 2014
"... We study the merchant operations of commodity and energy conversion assets. Examples of such assets include natural gas pipelines systems, commodity swing options, and power plants. Merchant operations involves managing these assets as real options on commodity and energy prices with the objective o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the merchant operations of commodity and energy conversion assets. Examples of such assets include natural gas pipelines systems, commodity swing options, and power plants. Merchant operations involves managing these assets as real options on commodity and energy prices with the objective of maximizing the market value of these assets. The economic relevance of natural gas conversion assets has increased considerably since the occurrence of the oil and gas shale boom; for example, the Energy Information Agency expects natural gas to be the source of 30 % of the world’s electricity production by 2040 and the McKinsey Global Institute projects United States spending on energy infrastructure to be about 100 Billion dollars by 2020. Managing commodity and energy conversion assets can be formulated as intractable Markov decision problems (MDPs), especially when using high dimensional price models commonly employed in practice. We develop approximate dynamic programming (ADP) methods for computing near optimal policies and lower and upper bounds on the market value of these assets. We focus on overcoming issues with the standard math programming
Information Relaxation Bounds for Infinite Horizon Markov Decision Processes
"... We consider infinite horizon stochastic dynamic programs with discounted costs and study how to use information relaxations to calculate lower bounds on the performance of an optimal policy. We develop a general framework that allows for reformulations of the underlying state transition function. Th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider infinite horizon stochastic dynamic programs with discounted costs and study how to use information relaxations to calculate lower bounds on the performance of an optimal policy. We develop a general framework that allows for reformulations of the underlying state transition function. These reformulations can simplify the information relaxation calculations, both in leading to finite horizon subproblems and by reducing the number of states in these subproblems. We study as important special cases both “weak formulations ” in which states are independent of actions and “strong formulations” which retain the original dependence on actions. Our reformulations incorporate penalties for information in a direct way via control variate terms. We show that the approach improves on the lower bounds from “Bellman feasible ” approximate value functions when the control variates are built from these approximate value functions. We apply the approach to the problem of dynamic service allocation in a multiclass queue; such models are wellstudied but challenging to solve. In our examples, we find the information relaxation lower bounds are relatively easy to calculate and are very close to the upper bounds obtained from simple heuristics. Finally, we discuss extensions of the approach to stochastic shortest path and average cost problems.
1 Global Adaptive Dynamic Programming for ContinuousTime Nonlinear Systems
"... ar ..."
(Show Context)
1Adaptive Read Thresholds for NAND Flash
"... Abstract—A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of rereads to ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of rereads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, the paper attempts to find a read threshold minimizing biterrorrate (BER) and derives an expression for the resulting codeworderrorrate. For soft decoding, it shows that minimizing BER and minimizing codeworderrorrate are competing objectives in the presence of a limited number of allowed rereads, and proposes a tradeoff between the two. The proposed method does not require any prior knowledge about the noise distribution, but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise but the paper explains how the method could be extended to other scenarios. Index Terms—Flash memory, multilevel memory, voltage threshold, adaptive read, soft information, symmetric capacity.
Quadratic approximate . . . inputaffine systems
, 2012
"... We consider the use of quadratic approximate value functions for stochastic control problems with inputaffine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in such cases requires the solution of an explicit convex optimization problem, such as ..."
Abstract
 Add to MetaCart
We consider the use of quadratic approximate value functions for stochastic control problems with inputaffine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in such cases requires the solution of an explicit convex optimization problem, such as a quadratic program, which can be carried out efficiently. We describe a simple and general method for approximate value iteration that also relies on our ability to solve convex optimization problems, in this case, typically a semidefinite program. Although we have no theoretical guarantee on the performance attained using our method, we observe that very good performance can be obtained in practice.