Results 1  10
of
14
Convergence of Stochastic Iterative Dynamic Programming Algorithms
 Neural Computation
, 1994
"... Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of th ..."
Abstract

Cited by 209 (8 self)
 Add to MetaCart
Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of the behavior of these methods has been missing. In this paper we relate DPbased learning algorithms to powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD() and Qlearning belong. 1
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is co ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a preaccumulated set of samples is used in a onetime model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new online and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or modelfree,...
Learning to Solve Markovian Decision Processes
, 1994
"... This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have d ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting an evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the develop...
Stochastic Programming in Transportation and Logistics
, 2003
"... Freight transportation is characterized by highly dynamic information processes: customers call in orders over time to move freight; the movement of freight over long distances is subject to random delays; equipment failures require last minute changes; and decisions are not always executed in th ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Freight transportation is characterized by highly dynamic information processes: customers call in orders over time to move freight; the movement of freight over long distances is subject to random delays; equipment failures require last minute changes; and decisions are not always executed in the field according to plan. The highdimensionality of the decisions involved has made transportation a natural application for the techniques of mathematical programming, but the challenge of modeling dynamic information processes has limited their success. In this chapter, we explore the use of concepts from stochastic programming in the context of resource allocation problems that arise in freight transportation. Since transportation problems are often quite large, we focus on the degree to which some techniques exploit the natural structure of these problems. Experimental work in the context of these applications is quite limited, so we highlight the techniques that appear to be the most promising.
A New Parameter Estimation Method for Gaussian Mixtures
 in Advances in Neural Information Processing Systems
, 1998
"... We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised online learning. In contrast to gradient descent and EM, which estimate the mixture's covariance matrices, the proposed method esti ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised online learning. In contrast to gradient descent and EM, which estimate the mixture's covariance matrices, the proposed method estimates the inverses of the covariance matrices. Furthermore, the new parameter estimation procedure can be applied in both online and batch settings. We show experimentally that it is typically faster than EM, and usually requires about half as many iterations as EM. We also describe experiments with digit recognition that demonstrate the merits of the online version when the source generating the data is nonstationary. Keywords: Mixture of Gaussians, Online learning, EM, Convergence rate, Digit recognition 1 Introduction Mixture models, in particular mixtures of Gaussians, have been a popular tool for density estimation, clustering, and unsupervised learning with a wide range of appl...
Parameter estimation: Known vector signals in unknown Gaussian noise" Submitted for publication in
 Computer Science, University of Texas at Dallas
, 1998
"... This paper develops recursive, convergent estimators for the parameters of finite Gaussian mixtures with a common covariance matrix. The mean vectors (signals) of the component densities are assumed to be known. The motivation for the study stems from digital communication. The basic approach is fir ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper develops recursive, convergent estimators for the parameters of finite Gaussian mixtures with a common covariance matrix. The mean vectors (signals) of the component densities are assumed to be known. The motivation for the study stems from digital communication. The basic approach is first illustrated for the case of an independent identically distributed sequence of samples from a univariate mixture of M classes (symbols). This is accomplished through the development of a convergent stochastic approximation form of estimator for the common variance value. The asymptotic variance of the estimated variance is derived. A batch processing alternative that possesses a sufficient statistic is developed for the case of a fixed size sample set. Three generalizations are studied. The first extends from the case of the univariate data to multivariate data. The second generalization allows for the statistical dependence of successive vector signals. Finally, the case of dependent successive vector signals along with dependent successive additive noise vectors is treated. In each case, convergent estimators for all unknown parameters are developed. Many cases are illustrated with simulation experiments. Results presented are applicable to communication engineering, pattern recognition, and some special image processing problems.
Relative Reward Strength Algorithms for Learning Automata
 IEEE Trans. Syst., Man, Cybern
, 1989
"... We examine a new class of action probability update algorithms for learning automata that use the relative reward strengths of responses from the environment. Specifically, we study update algorithms for SModel automata in which "recent" environmental responses for each of the actions are retained ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We examine a new class of action probability update algorithms for learning automata that use the relative reward strengths of responses from the environment. Specifically, we study update algorithms for SModel automata in which "recent" environmental responses for each of the actions are retained and used. We prove a convergence result and study the behavior of these automata through simulation. A major result of the paper is that the performance of these algorithms is superior, in several respects, to that of the wellknown SLR\GammaI update algorithm. Additional results are presented on the variability of performance, the cost of learning and, in the case of static environments, modifications that result in improved convergence. 1 This work is supported in part by the Office of Naval Research grant N0001487K0304 and NSF equipment grant CER DCR 1 Introduction Learning automata have been the subject of intense research during the past two decades [3, 14, 17, 27]. Their inheren...
Associative Reinforcement Learning of Realvalued Functions
 Proceedings of the IEEE Conference on Systems, Man, and Cybernetics
, 1991
"... Associative reinforcement learning tasks defined by Barto and Anandan [4] combine elements of problems involving optimization under uncertainty, studied by learning automata theorists, and supervised learning patternclassification. In our previous work, we presented the SRV algorithm [15] which had ..."
Abstract
 Add to MetaCart
Associative reinforcement learning tasks defined by Barto and Anandan [4] combine elements of problems involving optimization under uncertainty, studied by learning automata theorists, and supervised learning patternclassification. In our previous work, we presented the SRV algorithm [15] which had been designed for extended versions of associative reinforcement learning tasks wherein the learning system's outputs could take on real values. In this paper, we state and prove a strong convergence theorem that implies a form of optimal performance (under certain conditions) of the SRV algorithm on these tasks. Simulation results are presented to illustrate the convergence behavior of the algorithm under the conditions of the theorem. The robustness of the algorithm is also demonstrated by simulations in which some of the conditions of the theorem are violated. This material is based upon work supported by the Air Force Office of Scientific Research, Bolling AFB, under Grant AFOSR890526...
Learning to Control Dynamic Systems Via Associative Reinforcement Learning
"... this paper. The internal critic network has 8 input units, a hidden layer of 10 backpropagation units, and a single temporal difference output unit. The controller has 4 input units and a single action unit. In simulations of the supervised learning method, a "noisy" linear unit was used as the act ..."
Abstract
 Add to MetaCart
this paper. The internal critic network has 8 input units, a hidden layer of 10 backpropagation units, and a single temporal difference output unit. The controller has 4 input units and a single action unit. In simulations of the supervised learning method, a "noisy" linear unit was used as the action unit, while in simulations of the reinforcement learning method, a stochastic realvalued (SRV) unit [25] was used.