Results 1  10
of
62
Sastry,”Varieties of Learning Automata: An Overview
 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS
, 2002
"... Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of ..."
Abstract

Cited by 74 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of LA has been modified in several directions to suit different applications. Concepts such as parameterized learning automata (PLA), generalized learning automata (GLA), and continuous actionset learning automata (CALA) have been proposed, analyzed, and applied to solve many significant learning problems. Furthermore, groups of LA forming teams and feedforward networks have been shown to converge to desired solutions under appropriate learning algorithms. Modules of LA have been used for parallel operation with consequent increase in speed of convergence. All of these concepts and results are relatively new and are scattered in technical literature. An attempt has been made in this paper to bring together the main ideas involved in a unified framework and provide pointers to relevant references. Index Terms—Continuous actionset learning automata (CALA), generalized learning automata (GLA), modules of learning automata, parameterized learning automata (PLA), teams and networks of learning automata. I.
Shifting Inductive Bias with SuccessStory Algorithm, Adaptive Levin Search, and Incremental SelfImprovement
 MACHINE LEARNING
, 1997
"... We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate longterm effects of bias shifts setting the stage for later bias shifts we use the "successstory algori ..."
Abstract

Cited by 72 (32 self)
 Add to MetaCart
We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate longterm effects of bias shifts setting the stage for later bias shifts we use the "successstory algorithm" (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger longterm reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental selfimprovement). Our inductive transfer case studies...
Learning in Network Contexts: Experimental Results from Simulations
, 2000
"... This paper describes the results of simulation experiments performed on a suite of learning algorithms. We focus on games in network contexts. These are contexts in which (1) agents have very limited information about the game; (2) play can be extremely asynchronous. There are many proposed learning ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
This paper describes the results of simulation experiments performed on a suite of learning algorithms. We focus on games in network contexts. These are contexts in which (1) agents have very limited information about the game; (2) play can be extremely asynchronous. There are many proposed learning algorithms in the literature. We choose a small sampling of such algorithms and use numerical simulation to explore the nature of asymptotic play. In particular, we explore the extent to which the asymptotic play depends on three factors, namely: limited information, asynchronous play, and the degree of responsiveness of the learning algorithm.
A Comparison of Continuous and Discretized Pursuit Learning Schemes
, 1999
"... A Learning Automaton is an automaton that interacts with a random environment, having as its goal the task of learning the optimal action based on its acquired experience. Many learning automata have been proposed, with the class of Estimator Algorithms being among the fastest ones. Thathachar and S ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
A Learning Automaton is an automaton that interacts with a random environment, having as its goal the task of learning the optimal action based on its acquired experience. Many learning automata have been proposed, with the class of Estimator Algorithms being among the fastest ones. Thathachar and Sastry [23], through the Pursuit Algorithm, introduced the concept of learning algorithms that pursue the current optimal action, following a RewardPenalty learning philosophy. Later, Oommen and Lanctt [16] extended the Pursuit Algorithm into the discretized world by presenting the Discretized Pursuit Algorithm, based on a RewardInaction learning philosophy. In this paper, we argue that the RewardPenalty and RewardInaction learning paradigms in conjunction with the continuous and discrete models of computation lead to four versions of Pursuit Learning Automata. We contend that a scheme that merges the Pursuit concept with the most recent response of the Environment permits the algorithm to...
Discretized estimator learning automata
 IEEE Trans. Systems Man Cybernet
, 1992
"... ..."
(Show Context)
Synchronous and Asynchronous Learning by Responsive Learning Automata
 Learning and Implementation on the Internet.&quot; Manuscript
, 1996
"... We consider the ability of economic agents to learn in a decentralized environment in which agents do not know the (stochastic) payoff matrix and can not observe their opponents' actions; they merely know, at each stage of the game, their own action and the resulting payoff. We discuss the r ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
We consider the ability of economic agents to learn in a decentralized environment in which agents do not know the (stochastic) payoff matrix and can not observe their opponents' actions; they merely know, at each stage of the game, their own action and the resulting payoff. We discuss the requirements for learning in such an environment, and show that a simple probabilistic learning algorithm satisfies two important optimizing properties: i) When placed in an unknown but eventually stationary random environment, they converge in bounded time, in a sense we make precise, to strategies that maximize average payoff. ii) They satisfy a monotonicity property (related to the "law of the effect") in which increasing the payoffs for a given strategy increases the probability of that strategy being played in the future. We then study how groups of such learners interact in a general game. We show that synchronous groups of these learners converge to the serially undominated set. ...
A bayesian rule for adaptive control based on causal interventions
 In Proceedings of the third conference on general artificial intelligence
, 2010
"... Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a ‘possible world’, but the agent does not know which of the po ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a ‘possible world’, but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compatible with the true world. A natural measure of adaptation can be obtained by the KullbackLeibler (KL) divergence between the I/O distribution of the true world and the I/O distribution expected by the agent that is uncertain about possible worlds. In the case of pure input streams, the Bayesian mixture provides a wellknown solution for this problem. We show, however, that in the case of I/O streams this solution breaks down, because outputs are issued by the agent itself and require a different probabilistic syntax as provided by intervention calculus. Based on this calculus, we obtain a Bayesian control rule that allows modeling adaptive behavior with mixture distributions over I/O streams. This rule might allow for a novel approach to adaptive control based on a minimum KLprinciple.
A Minimum Relative Entropy Principle for Learning and Acting
 J. Artif. Intell. Res. 2010
"... This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adapti ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the wellknown Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert. 1.
Two Novel Onpolicy Reinforcement Learning Algorithms based on TD(λ)methods
"... Abstract — This paper describes two novel onpolicy reinforcement learning algorithms, named QV(λ)learning and the actor critic learning automaton (ACLA). Both algorithms learn a state valuefunction using TD(λ)methods. The difference between the algorithms is that QVlearning uses the learned val ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
(Show Context)
Abstract — This paper describes two novel onpolicy reinforcement learning algorithms, named QV(λ)learning and the actor critic learning automaton (ACLA). Both algorithms learn a state valuefunction using TD(λ)methods. The difference between the algorithms is that QVlearning uses the learned value function and a form of Qlearning to learn Qvalues, whereas ACLA uses the value function and a learning automatonlike update rule to update the actor. We describe several possible advantages of these methods compared to other valuefunctionbased reinforcement learning algorithms such as Qlearning, Sarsa, and conventional ActorCritic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network valuefunction representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms. I.