Results 1  10
of
19
Reinforcement learning algorithms for MDPs
, 2009
"... This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference learning. Next, we compare increment ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference learning. Next, we compare incremental and batch algorithmic variants and discuss the impact of the choice of the function approximation method on the success of learning. In the second half, we describe methods that target the problem of learning to control an MDP. Here online and active learning are discussed first, followed by a description of direct and actorcritic methods.
SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology and Policy
, 2009
"... We address the problem of modeling longterm energy policy and investment decisions while retaining the important ability to capture finegrained variations in intermittent energy and demand, as well as storage. In addition, we wish to capture sources of uncertainty such as future energy policies, c ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
We address the problem of modeling longterm energy policy and investment decisions while retaining the important ability to capture finegrained variations in intermittent energy and demand, as well as storage. In addition, we wish to capture sources of uncertainty such as future energy policies, climate, and technological advances, in addition to the variability as well as uncertainty in wind energy, demands, prices and rainfall. Accurately modeling the value of all investments such as wind and solar requires handling finegrained temporal variability and uncertainty in wind and solar, as well as the use of storage. We propose a modeling and algorithmic strategy based on the framework of approximate dynamic programming (ADP) that can model these problems at hourly time increments over a multidecade horizon, while still capturing different types of uncertainty. This paper describes initial proof of concept experiments for an ADPbased model, called SMART, by describing the modeling and algorithmic strategy, and providing comparisons against a deterministic benchmark as
Bayesian nonparametric multivariate convex regression
, 2011
"... In many applications, such as economics, operations research and reinforcement learning, one often needs to estimate a multivariate regression function f subject to a convexity constraint. For example, in sequential decision processes the value of a state under optimal subsequent decisions may be kn ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
In many applications, such as economics, operations research and reinforcement learning, one often needs to estimate a multivariate regression function f subject to a convexity constraint. For example, in sequential decision processes the value of a state under optimal subsequent decisions may be known to be convex or concave. We propose a new Bayesian nonparametric multivariate approach based on characterizing the unknown regression function as the max of a random collection of unknown hyperplanes. This specification induces a prior with large support in a KullbackLeibler sense on the space of convex functions, while also leading to strong posterior consistency. Although we assume that f is defined over Rp, we show that this model has a convergence rate of log(n)−1n−1/(d+2) under the empirical L2 norm when f actually maps a d dimensional linear subspace to R. We design an efficient reversible jump MCMC algorithm for posterior computation and demonstrate the methods through application to value function approximation. 1
Using stochastic approximation methods to compute optimal basestock levels in inventory inventory control problems
 Operations Research
, 2008
"... In this paper, we consider numerous inventory control problems for which the basestock policies are known to be optimal and we propose stochastic approximation methods to compute the optimal basestock levels. The existing stochastic approximation methods in the literature guarantee that their iter ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper, we consider numerous inventory control problems for which the basestock policies are known to be optimal and we propose stochastic approximation methods to compute the optimal basestock levels. The existing stochastic approximation methods in the literature guarantee that their iterates converge, but not necessarily to the optimal basestock levels. In contrast, we prove that the iterates of our methods converge to the optimal basestock levels. Moreover, our methods continue to enjoy the wellknown advantages of the existing stochastic approximation methods. In particular, they only require the ability to obtain samples of the demand random variables, rather than to compute expectations explicitly and they are applicable even when the demand information is censored by the amount of available inventory. 1
The role of price spreads and reoptimization in the real option management of commodity storage assets. Working paper
"... The real option management of commodity storage assets is an important practical problem. Practitioners approach the resulting stochastic optimization model using heuristic policies that rely on sequential reoptimization of linear programs. Used in conjunction with Monte Carlo simulation, these poli ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The real option management of commodity storage assets is an important practical problem. Practitioners approach the resulting stochastic optimization model using heuristic policies that rely on sequential reoptimization of linear programs. Used in conjunction with Monte Carlo simulation, these policies typically yield near optimal lower bound estimates on the value of storage. This paper reveals that a simple one stage lookahead policy is optimal for a fast storage asset without frictions. Thus, in this (not entirely realistic) case the problem is easy and the reoptimization policies are unnecessary, albeit optimal. In contrast, this paper provides numerical and structural justification for the use of these policies in the general case. Further, the use of price spreads simplifies the estimation of near tight dual upper bounds on the value of storage. This approach relies on using the fast and frictionless asset optimal value function to estimate dual upper bounds in the general case. Monte Carlo simulation and linear programming Storable commodity industries include storage assets embedded in physical markets for the commodity, and financial markets for commodity derivatives. These markets can be fairly competitive, as exemplified by the natural gas industry in North America and parts of Europe
An opportunity cost view of basestock optimality for the warehouse problem
 Handbook of Integrated Risk Management in Global Supply Chains
"... This work considers the so called warehouse problem, which is a prototypical problem of the trading activity of a merchant in a commodity market. It is known that the merchant’s optimal trading policy for this problem has a basestock structure. The exiting proofs of this result hinge on marginal ana ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This work considers the so called warehouse problem, which is a prototypical problem of the trading activity of a merchant in a commodity market. It is known that the merchant’s optimal trading policy for this problem has a basestock structure. The exiting proofs of this result hinge on marginal analysis, and may not be easily accessible to managers. This work provides an elementary derivation of the optimality of this structure relying almost exclusively on geometric arguments based on the notion of opportunity cost of a trade, a concept familiar to commodity merchants. Some aspects of managerial relevance associated with this structure are also discussed. It is hoped that the material presented in this work would be of interest to managers involved in the merchant management of commodity storage. 1
Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. Working paper
, 2014
"... manuscript (Please, provide the mansucript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
manuscript (Please, provide the mansucript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication.
Approximate Dynamic Programming for Commodity and Energy Merchant Operations
, 2014
"... We study the merchant operations of commodity and energy conversion assets. Examples of such assets include natural gas pipelines systems, commodity swing options, and power plants. Merchant operations involves managing these assets as real options on commodity and energy prices with the objective o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the merchant operations of commodity and energy conversion assets. Examples of such assets include natural gas pipelines systems, commodity swing options, and power plants. Merchant operations involves managing these assets as real options on commodity and energy prices with the objective of maximizing the market value of these assets. The economic relevance of natural gas conversion assets has increased considerably since the occurrence of the oil and gas shale boom; for example, the Energy Information Agency expects natural gas to be the source of 30 % of the world’s electricity production by 2040 and the McKinsey Global Institute projects United States spending on energy infrastructure to be about 100 Billion dollars by 2020. Managing commodity and energy conversion assets can be formulated as intractable Markov decision problems (MDPs), especially when using high dimensional price models commonly employed in practice. We develop approximate dynamic programming (ADP) methods for computing near optimal policies and lower and upper bounds on the market value of these assets. We focus on overcoming issues with the standard math programming
An Approximate Dynamic Programming Algorithm for Monotone Value Functions
"... manuscript (Please, provide the manuscript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
manuscript (Please, provide the manuscript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication.