Results 21 - 30
of
107
On Partially Controlled Multi-Agent Systems
- Journal of Artificial Intelligence Research
, 1996
"... Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the des ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multi-agent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms? what methods can be applied to influence the uncontrollable agents? what is their effectiveness ? and whether similar methods work across different domains? Using a game-theoretic framework, this paper studies the design of partially controlled multi-agent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other th...
Exploration and Inference in Learning from Reinforcement
, 1997
"... Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling explorati ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling exploration and inference in reinforcement learning. First I distinguish between the well known exploration-exploitation trade-off and what I term exploration for future exploitation. It is argued that there are many tasks where it is more appropriate to maximise this latter measure. In particular it is appropriate when we want to employ learning algorithms as part of the process of designing a controller. Informed by this insight I develop a number of novel measures of the agent's task knowledge. The first of these is a measure of the probability of a particular course of action being the optimal course of action. Estimators are developed for this measure for boolean and non-boolean processes. These...
Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents
, 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multi-agent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...
Ant colony optimization and stochastic gradient descent
- Artificial Life
, 2002
"... In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of aco algorithms. We then use this insight to explore the mutual contributions of the two techniques.
Coordination of Multiple Mobile Robots via Communication
- PROC. SPIE'98, MOBILE ROBOTS XIII CONFERENCE
, 1998
"... Research on the co-ordination of multiple mobile robots has to address three main problems: (i) how to appropriately divide the functionality of the system into multiple robots, (ii) how to manage the dynamic configuration of the system, and (iii) how to realise co-operation behaviour. This paper wi ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Research on the co-ordination of multiple mobile robots has to address three main problems: (i) how to appropriately divide the functionality of the system into multiple robots, (ii) how to manage the dynamic configuration of the system, and (iii) how to realise co-operation behaviour. This paper will concentrate on the third aspect. More specifically, the aim of our research is to develop a team of coordinating mobile robots via effective communication for real world applications. We will describe the methodology to achieve co-operative behaviour, the experimental mobile robots developed, and potential application areas. The developed system is demonstrated by two examples such as flocking and shared experience learning.
Reinforcement learning with immediate rewards and linear hypotheses
- Algorithmica
, 2003
"... Abstract. We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when • the consequence of a given action is felt immediately, and • a linear function, which is unknown a priori, (approximately) relates ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Abstract. We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when • the consequence of a given action is felt immediately, and • a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
Effects of Delayed Communication in Dynamic Group Formation
- IEEE Trans. Syst., Man, Cybern
, 1993
"... We investigate how delayed communication affects the dynamic formation of groups in distributed systems, where all decision-making agents join the same group because each expects to improve its own performance. For example, distributed job schedulers may form a group to utilize the idle resources of ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We investigate how delayed communication affects the dynamic formation of groups in distributed systems, where all decision-making agents join the same group because each expects to improve its own performance. For example, distributed job schedulers may form a group to utilize the idle resources of other members within the group. Forming a group is a search problem and we examine agents which use the feedback mechanism of stochastic learning automata to carry out this search. Although a group formation may have the potential for synergy, the agents must successfully coordinate their actions within the group relevant to the application. For example, job schedulers who form a group must still balance the load among the shared resources; that is, the collective actions of the schedulers need to be coordinated and greedy schedulers who all pick the same processor may not be successful. Agents may find that working alone is more desirable since their actions need not be coordinated and the r...
Statistical Machine Learning and Combinatorial Optimization
- Theoretical Aspects of Evolutionary Computing
, 2000
"... In this work we apply statistical learning methods in the context of combinatorial optimization, which is understood as nding a binary string minimizing a given cost function. We rst consider probability densities over binary strings and we dene two dierent statistical criteria. Then we recast t ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this work we apply statistical learning methods in the context of combinatorial optimization, which is understood as nding a binary string minimizing a given cost function. We rst consider probability densities over binary strings and we dene two dierent statistical criteria. Then we recast the initial problem as the problem of nding a density minimizing one of the two criteria. We restrict ourselves to densities described by a small number of parameters and solve the new problem by means of gradient techniques. This results in stochastic algorithms which iteratively update density parameters. We apply these algorithms to two families of densities, the Bernoulli model and the Gaussian model. The algorithms have been implemented and some experiments are reported. 1 Introduction In this work, we apply statistical learning methods in the context of combinatorial optimization, which is understood as nding a binary string minimizing a given cost function. We transform t...
PALO: A Probabilistic Hill-Climbing Algorithm
- Artificial Intelligence
, 1995
"... Many learning systems search through a space of possible performance elements, seeking an element whose expected utility, over the distribution of problems, is high. As the task of finding the globally optimal element is often intractable, many practical learning systems instead hill-climb to a loca ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Many learning systems search through a space of possible performance elements, seeking an element whose expected utility, over the distribution of problems, is high. As the task of finding the globally optimal element is often intractable, many practical learning systems instead hill-climb to a local optimum. Unfortunately, even this is problematic as the learner typically does not know the underlying distribution of problems, which it needs to determine an element's expected utility. This paper addresses the task of approximating this hill-climbing search when the utility function can only be estimated by sampling. We present a general algorithm, palo, that returns an element that is, with provably high probability, essentially a local optimum. We then demonstrate the generality of this algorithm by presenting three distinct applications, that respectively find an element whose efficiency, accuracy or completeness is nearly optimal. These results suggest approaches to solving the util...
On Planning And Exploration In Non-Discrete Environments
- Gesellschaft fur Mathematik und Datenverarbeitung, D-5205 St
, 1991
"... The application of reinforcement learning to control problems has received considerable attention in the last few years [And86, Bar89, Sut84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantag ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
The application of reinforcement learning to control problems has received considerable attention in the last few years [And86, Bar89, Sut84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods [TML91, TML90]. By interaction with an unknown environment a world model is progressively constructed using the backpropagation algorithm. For optimizing actions with respect to future reinforcement planning is applied in two steps: An experience network proposes a plan which is subsequently optimized by gradient descent with a chain of model networks. While operating in a goal-oriented manner due to the planning process the experience network is trained. Its accumulating experience is fed back into the planning process in form of initial plans, such that planning can be gradually reduced. In order to ensure complete system identif...

