Probability: Theory and examples
 CAMBRIDGE U PRESS
, 2011
Some times the lights are shining on me. Other times I can barely see. Lately it occurs to me what a long strange trip its been. Grateful Dead In 1989 when the first edition of the book was completed, my sons David and Greg were 3 and 1, and the cover picture showed the Dow Jones at 2650. The last twenty years have brought many changes but the song remains the same. The title of the book indicates that as we develop the theory, we will focus our attention on examples. Hoping that the book would be a useful reference for people who apply probability in their work, we have tried to emphasize the results that are important for applications, and illustrated their use with roughly 200 examples. Probability is not a spectator sport, so the book contains almost 450 exercises to challenge the reader and to deepen their understanding. The fourth edition has two major changes (in addition to a new publisher): (i) The book has been converted from TeX to LaTeX. The systematic use of labels should eventually eliminate problems with references to other points in the text. In
Least Squares Policy Evaluation Algorithms With Linear Function Approximation
 Theory and Applications
, 2002
"... We consider policy evaluation algorithms within the context of infinitehorizon dynamic programming problems with discounted cost. We focus on discretetime dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function ..."
We consider policy evaluation algorithms within the context of infinitehorizon dynamic programming problems with discounted cost. We focus on discretetime dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradientlike algorithm involving leastsquares subproblems and a diminishing stepsize, which is based on the #policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(#) algorithm recently proposed by Boyan, which for # =0coincides with the linear leastsquares temporaldifference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(#), with probability 1, for every # [0, 1].
The o.d.e. method for convergence of stochastic approximation and reinforcement learning
 SIAM J. CONTROL OPTIM
, 2000
"... It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the result ..."
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Qlearning algorithms are convergent for the average cost optimal control problem.
A simultaneous perturbation stochastic approximationbased ActorCritic . . .
, 2004
"... A twotimescale simulationbased actorcritic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic po ..."
A twotimescale simulationbased actorcritic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic policies and uses simultaneous perturbation stochastic approximationbased estimates. On the faster scale, the value function corresponding to a given stationary policy is updated and averaged over a fixed number of epochs (for enhanced performance). The proof of convergence to a locally optimal policy is presented. Finally, numerical experiments using the proposed algorithm on flow control in a bottleneck link using a continuous time queueing model are shown.
Asynchronous Stochastic Approximations
 SIAM J. Control Optim
, 1998
"... . The asymptotic behavior of a distributed, asynchronous stochastic approximation scheme is analyzed in terms of a limiting nonautonomous di#erential equation. The relation between the latter and the relative values of suitably rescaled relative frequencies of updates of di#erent components is under ..."
. The asymptotic behavior of a distributed, asynchronous stochastic approximation scheme is analyzed in terms of a limiting nonautonomous di#erential equation. The relation between the latter and the relative values of suitably rescaled relative frequencies of updates of di#erent components is underscored. Key words. distributed algorithms, asynchronous algorithms, communication delays, stochastic approximation, ODE limit AMS subject classifications. 62L20, 93E25 PII. S0363012995282784 1. Introduction. There has been a resurgence of interest in stochastic approximation algorithms, particularly as mechanisms for learning systems. They can, for example, be a learning algorithm for neural networks [13] or a model of learning by boundedly rational agents in a macroeconomic system [20], in addition to their traditional applications in adaptive engineering systems [2]. These applications call for a distributed, asynchronous implementation of stochastic approximation schemes. In engineerin...
Optimal Structured Feedback Policies for ABR Flow Control Using Two Timescale SPSA
 Control,” Proceedings of the Summer Computer Simulation Conference, Society for Computer Simulation
, 1994
"... Abstract—Optimal structured feedback control policies for ratebased flow control of available bit rate service in asynchronous transfer mode networks are obtained in the presence of information and propagation delays, using a numerically efficient twotimescale simultaneous perturbation stochastic ..."
Abstract—Optimal structured feedback control policies for ratebased flow control of available bit rate service in asynchronous transfer mode networks are obtained in the presence of information and propagation delays, using a numerically efficient twotimescale simultaneous perturbation stochastic approximation algorithm. Models comprising both a single bottleneck node and a network with multiple bottleneck nodes are considered. A convergence analysis of the algorithm is presented. Numerical experiments demonstrate fast convergence even in the presence of significant delays. We also illustrate performance comparisons with the wellknown Explicit Rate Indication for Congestion Avoidance (ERICA) algorithm and describe another algorithm (based on ERICA) that does not require estimating available bandwidth (as in ERICA). Index Terms—Network of nodes, optimal structured feedback policies, ratebased ABR flow control, single bottleneck node, twotimescale SPSA. I.
A Compactness Principle For Bounded Sequences Of Martingales With Applications
 PROCEEDINGS OF THE SEMINAR OF STOCHASTIC ANALYSIS, RANDOM FIELDS AND APPLICATIONS, PROGRESS IN PROBABILITY
, 1996
"... For H¹ bounded sequences, we introduce a technique, related to the KadecPełczynskidecomposition for L¹ sequences, that allows us to prove compactness theorems. Roughly speaking, a bounded sequence in H¹ can be split into two sequences, one of which is weakly compact, the other forms the singular p ..."
For H¹ bounded sequences, we introduce a technique, related to the KadecPełczynskidecomposition for L¹ sequences, that allows us to prove compactness theorems. Roughly speaking, a bounded sequence in H¹ can be split into two sequences, one of which is weakly compact, the other forms the singular part. If the martingales are continuous then the singular part tends to zero in the semimartingale topology. In the general case the singular parts give rise to a process of bounded variation. The technique allows to give a new proof of the optional decomposition theorem in Mathematical Finance.
Adaptive estimation of HMM transition probabilities
 IEEE Trans. on Signal Processing
"... Abstract — This paper presents new schemes for recursive estimation of the state transition probabilities for hidden Markov models (HMM’s) via extended least squares (ELS) and recursive state prediction error (RSPE) methods. Local convergence analysis for the proposed RSPE algorithm is shown using t ..."
Abstract — This paper presents new schemes for recursive estimation of the state transition probabilities for hidden Markov models (HMM’s) via extended least squares (ELS) and recursive state prediction error (RSPE) methods. Local convergence analysis for the proposed RSPE algorithm is shown using the ordinary differential equation (ODE) approach developed for the more familiar recursive output prediction error (RPE) methods. The presented scheme converges and is relatively well conditioned compared with the previously proposed RPE scheme for estimating transition probabilities that perform poorly in low noise. The ELS algorithm presented in this paper is computationally of order N 2 N 2 N 2, which is less than the computational effort of order N 4 required to implement the RSPE (and previous RPE) scheme, where N is the number of Markov states. Building on earlier work, an algorithm for simultaneous estimation of the state output mappings and the state transition probabilities that requires less computational effort than earlier schemes is also presented and discussed. Implementation aspects of the proposed algorithms are discussed, and simulation studies are presented to illustrate convergence and convergence rates. Index Terms—Hidden Markov models, parameter estimation, recursive estimation. I.
SHAPE  A Stochastic Hybrid Approximation Procedure, with an Application to Dynamic Networks
, 1994
"... We consider the problem of approximating the expected recourse function for twostage stochastic programs. Our problem is motivated by applications that have special structure, such as an underlying network that allows reasonable approximations to the expected recourse function to be developed. In t ..."
We consider the problem of approximating the expected recourse function for twostage stochastic programs. Our problem is motivated by applications that have special structure, such as an underlying network that allows reasonable approximations to the expected recourse function to be developed. In this paper, we show how these approximations can be improved by combining them with sample gradient information from the true recourse function. For the case of strictly convex nonlinear approximations, we prove convergence for this hybrid approximation. The method is attractive for practical reasons because it retains the structure of the approximation.