Results 1  10
of
258
The ant colony optimization metaheuristic
 in New Ideas in Optimization
, 1999
"... Ant algorithms are multiagent systems in which the behavior of each single agent, called artificial ant or ant for short in the following, is inspired by the behavior of real ants. Ant algorithms are one of the most successful examples of swarm intelligent systems [3], and have been applied to many ..."
Abstract

Cited by 292 (23 self)
 Add to MetaCart
Ant algorithms are multiagent systems in which the behavior of each single agent, called artificial ant or ant for short in the following, is inspired by the behavior of real ants. Ant algorithms are one of the most successful examples of swarm intelligent systems [3], and have been applied to many types of problems, ranging from the classical traveling salesman
Bayesian Learning in Negotiation
, 1996
"... Recent growing interest in autonomous interacting software agents and their potential application in areas such as electronic commerce [Sandolm & Lesser 1995] has given increased importance to automated negotiation. MuchDAI and game theoretic research [Rosenschein & Zlotkin 1994; Osborne & Rubinstei ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
Recent growing interest in autonomous interacting software agents and their potential application in areas such as electronic commerce [Sandolm & Lesser 1995] has given increased importance to automated negotiation. MuchDAI and game theoretic research [Rosenschein & Zlotkin 1994; Osborne & Rubinstein 1994] deals with coordination and negotiation issues by giving precomputed solutions to specific problems. There has been much research reported on developing theoretical models in which learning plays an eminent role, especially in the area of adaptive dynamics of games (e.g., [Jordan 1992; Kalai & Lehrer 1993]). However, to build autonomous agents that improve their negotiation competence based on learning from their interactions with other agents is still an emerging area. We are interested in developing autonomous agents capable of reasoning based on experience and improving their negotiation behavior incrementally. Learning in negotiation is closely coupled with...
A Bayesian framework for reinforcement learning
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed tha ..."
Abstract

Cited by 74 (1 self)
 Add to MetaCart
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states. 1.
How to Dynamically Merge Markov Decision Processes
, 1997
"... We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to e#ciently find good solutions for doing the tasks in parallel. We formu ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to e#ciently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoreticallysound dynamic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing multiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simul...
Stochastic Linear Control over a Communication Channel
, 2003
"... We examine linear stochastic control systems when there is a communication channel connecting the sensor to the controller. The problem consists of designing the channel encoder and decoder as well as the controller to satisfy some given control objectives. In particular we examine the role communic ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
We examine linear stochastic control systems when there is a communication channel connecting the sensor to the controller. The problem consists of designing the channel encoder and decoder as well as the controller to satisfy some given control objectives. In particular we examine the role communication has on the classical LQG problem. We give conditions under which the classical separation property between estimation and control holds and the certainty equivalent control law is optimal. We then present the sequential rate distortion framework. We present bounds on the achievable performance and show the inherent tradeo#s between control and communication costs. In particular we show that optimal quadratic cost decomposes into two terms: a full knowledge cost and a sequential rate distortion cost.
Distributed Value Functions
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing t ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximize the sum of the rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of future rewards for all the nodes in the network. With this representation, each node can choose actions to improve the performance of the overall...
Concurrent Reachability Games
, 2008
"... We consider concurrent twoplayer games with reachability objectives. In such games, at each round, player 1 and player 2 independently and simultaneously choose moves, and the two choices determine the next state of the game. The objective of player 1 is to reach a set of target states; the objecti ..."
Abstract

Cited by 49 (20 self)
 Add to MetaCart
We consider concurrent twoplayer games with reachability objectives. In such games, at each round, player 1 and player 2 independently and simultaneously choose moves, and the two choices determine the next state of the game. The objective of player 1 is to reach a set of target states; the objective of player 2 is to prevent this. These are zerosum games, and the reachability objective is one of the most basic objectives: determining the set of states from which player 1 can win the game is a fundamental problem in control theory and system verification. There are three types of winning states, according to the degree of certainty with which player 1 can reach the target. From type1 states, player 1 has a deterministic strategy to always reach the target. From type2 states, player 1 has a randomized strategy to reach the target with probability 1. From type3 states, player 1 has for every real ε> 0 a randomized strategy to reach the target with probability greater than 1 − ε. We show that for finite state spaces, all three sets of winning states can be computed in polynomial time: type1 states in linear time, and type2 and type3 states in quadratic time. The algorithms to compute the three sets of winning states also enable the construction of the winning and spoiling strategies.
Fast Model Predictive Control Using Online Optimization
, 2008
"... A widely recognized shortcoming of model predictive control (MPC) is that it can usually only be used in applications with slow dynamics, where the sample time is measured in seconds or minutes. A well known technique for implementing fast MPC is to compute the entire control law offline, in which c ..."
Abstract

Cited by 48 (18 self)
 Add to MetaCart
A widely recognized shortcoming of model predictive control (MPC) is that it can usually only be used in applications with slow dynamics, where the sample time is measured in seconds or minutes. A well known technique for implementing fast MPC is to compute the entire control law offline, in which case the online controller can be implemented as a lookup table. This method works well for systems with small state and input dimensions (say, no more than 5), and short time horizons. In this paper we describe a collection of methods for improving the speed of MPC, using online optimization. These custom methods, which exploit the particular structure of the MPC problem, can compute the control action on the order of 100 times faster than a method that uses a generic optimizer. As an example, our method computes the control actions for a problem with 12 states, 3 controls, and horizon of 30 time steps (which entails solving a quadratic program with 450 variables and 1260 constraints) in around 5msec, allowing MPC to be carried out at 200Hz. 1
Multiple modelbased reinforcement learning
 Neural Computation
, 2002
"... We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple modelbased reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environme ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple modelbased reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The 1 system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The “responsibility signal,” which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discretetime, finite state case and continuoustime, continuous state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters. 1
Action understanding as inverse planning
 Cognition
, 2009
"... Humans are adept at inferring the mental states underlying other agents’ actions, such as goals, beliefs, desires, emotions and other thoughts. We propose a computational framework based on Bayesian inverse planning for modeling human action understanding. The framework represents an intuitive theor ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
Humans are adept at inferring the mental states underlying other agents’ actions, such as goals, beliefs, desires, emotions and other thoughts. We propose a computational framework based on Bayesian inverse planning for modeling human action understanding. The framework represents an intuitive theory of intentional agents ’ behavior based on the principle of rationality: the expectation that agents will plan approximately rationally to achieve their goals, given their beliefs about the world. The mental states that caused an agent’s behavior are inferred by inverting this model of rational planning using Bayesian inference, integrating the likelihood of the observed actions with the prior over mental states. This approach formalizes in precise probabilistic terms the essence of previous qualitative approaches to action understanding based on an “intentional stance ” (Dennett, 1987) or a “teleological stance ” (Gergely et al., 1995). In three psychophysical experiments using animated stimuli of agents moving in simple mazes, we assess how well different inverse planning models based on different goal priors can predict human goal inferences. The results provide quantitative evidence for an approximately rational inference mechanism in human goal inference within our simplified stimulus paradigm, and for the flexible nature of goal representations that human observers can adopt. We discuss the implications of our experimental results for human action understanding in realworld contexts, and suggest how our framework might be extended to capture other kinds of mental state inferences, such as inferences about beliefs, or inferring whether an entity is an intentional agent.