Results 21  30
of
294
On Partially Controlled MultiAgent Systems
 Journal of Artificial Intelligence Research
, 1996
"... Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multiagent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under th ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
(Show Context)
Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multiagent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multiagent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms? what methods can be applied to influence the uncontrollable agents? what is their effectiveness ? and whether similar methods work across different domains? Using a gametheoretic framework, this paper studies the design of partially controlled multiagent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other th...
On the Computational Economics of Reinforcement Learning
, 1990
"... Following terminology used in adaptive control, we distinguish between indirect learning methods, which learn explicit models of the dynamic structure of the system to be controlled, and direct learning methods, which do not. We compare an existing indirect method, which uses a conventional dynamic ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
Following terminology used in adaptive control, we distinguish between indirect learning methods, which learn explicit models of the dynamic structure of the system to be controlled, and direct learning methods, which do not. We compare an existing indirect method, which uses a conventional dynamic programming algorithm, with a closely related direct reinforcement learning method by applying both methods to an infinite horizon Markov decision problem with unknown statetransition probabilities. The simulations show that although the direct method requires much less space and dramatically less computation per control action, its learning ability in this task is superior to, or compares favorably with, that of the more complex indirect method. Although these results do not address how the methods' performances compare as problems become more difficult, they suggest that given a fixed amount of computational power available per control action, it may be better to use a direct reinforcemen...
Alopex: a correlationbased learning algorithm for feedforward and recurrent neural networks
 Neural Computation
, 1994
"... We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, an ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, and does not explicitly depend on the functional form of the error measure. Hence, it can be used in networks with arbitrary transfer functions and for minimizing a large class of error measures. The learning algorithm is the same for feedforward and recurrent networks. All the weights in a network are updated simultaneously, using only local computations. This allows complete parallelization of the algorithm. The algorithm is stochastic and it uses a ‘temperature ’ parameter in a manner similar to that in simulated annealing. A heuristic ‘ annealing schedule ’ is presented which is effective in finding global minima of error surfaces. In this paper, we report extensive simulation studies illustrating these advantages and show that learning times are comparable to those for standard gradient descent methods. Feedforward networks trained with Alopex are used to solve the MONK’s problems and symmetry problems. Recurrent networks trained with the same algorithm are used for solving temporal XOR problems. Scaling properties of the algorithm are demonstrated using encoder problems of different sizes and advantages of appropriate error measures are illustrated using a variety of problems.
Exploration and Inference in Learning from Reinforcement
, 1997
"... Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling explorati ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling exploration and inference in reinforcement learning. First I distinguish between the well known explorationexploitation tradeoff and what I term exploration for future exploitation. It is argued that there are many tasks where it is more appropriate to maximise this latter measure. In particular it is appropriate when we want to employ learning algorithms as part of the process of designing a controller. Informed by this insight I develop a number of novel measures of the agent's task knowledge. The first of these is a measure of the probability of a particular course of action being the optimal course of action. Estimators are developed for this measure for boolean and nonboolean processes. These...
A cellular learning automata based clustering algorithm for wireless sensor networks
 Sensor Letters
, 2008
"... In the first part of this paper, we propose a generalization of cellular learning automata (CLA) called irregular cellular learning automata (ICLA) which removes the restriction of rectangular grid structure in traditional CLA. In the second part of the paper, based on the proposed model a new clust ..."
Abstract

Cited by 24 (13 self)
 Add to MetaCart
(Show Context)
In the first part of this paper, we propose a generalization of cellular learning automata (CLA) called irregular cellular learning automata (ICLA) which removes the restriction of rectangular grid structure in traditional CLA. In the second part of the paper, based on the proposed model a new clustering algorithm for sensor networks is designed. The proposed clustering algorithm is fully distributed and the nodes in the network don't need to be fully synchronized with each other. The proposed clustering algorithm consists of two phases; initial clustering and reclustering. Unlike existing methods in which the reclustering phase is performed periodically on the entire network, reclustering phase in the proposed method is performed locally whenever it is needed. This results in a reduction in the consumed energy for reclustering phase and also allows reclustering phase to be performed as the network operates. The proposed clustering method in comparison to existing methods produces a clustering in which each cluster has higher number of nodes and higher residual energy for the cluster head. Local reclustering, higher residual energy in cluster heads and higher number of nodes in each cluster results in a network with longer lifetime. To evaluate the performance of the proposed algorithm several experiments have been conducted. The results of experiments have shown that the proposed clustering algorithm outperforms existing clustering methods in terms of quality of clustering measured by the total number of clusters, the number of sparse clusters and the remaining energy level of the cluster heads. Experiments have also shown that the proposed clustering algorithm in comparison to other existing methods prolongs the network lifetime.
Intention recognition promotes the emergence of cooperation. Adaptive Behavior
, 2011
"... Few problems have created the combined interest of so many unrelated areas as the evolution of cooperation. As a result, several mechanisms have been identified to work as catalyzers of cooperative behavior. Yet, these studies, mostly grounded on evolutionary dynamics and game theory, have neglected ..."
Abstract

Cited by 21 (16 self)
 Add to MetaCart
(Show Context)
Few problems have created the combined interest of so many unrelated areas as the evolution of cooperation. As a result, several mechanisms have been identified to work as catalyzers of cooperative behavior. Yet, these studies, mostly grounded on evolutionary dynamics and game theory, have neglected the important role played by intention recognition in behavioral evolution. Here we address explicitly this issue, characterizing the dynamics emerging from a population of intention recognizers. We derive a Bayesian Network model for intention recognition in the context of repeated social dilemmas and evolutionary game theory, by assessing the internal dynamics of trust between intention recognizers and their opponents. Intention recognizers are then able to predict the next move of their opponents based on past direct interactions, which, in turn, enables them to prevail over the most famous strategies of repeated dilemmas of cooperation, even in presence of noise. Overall, our framework offers new insights on the complexity and beauty of behavioral evolution driven by elementary forms of cognition.
Reinforcement Learning for LongRun Average Cost
, 2004
"... A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from thecurs of dimensORPOcG ( and breakdea rapidx in face of large statespaces. In addition,
Ant colony optimization and stochastic gradient descent
 Artificial Life
, 2002
"... In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of aco algorithms. We then use this insight to explore the mutual contributions of the two techniques.
LargeScale Dynamic Optimization Using Teams of Reinforcement Learning Agents
, 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multiagent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...