## Distributed Value Functions (1999)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.ri.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Sixteenth International Conference on Machine Learning |

Citations: | 52 - 1 self |

### BibTeX

@INPROCEEDINGS{Schneider99distributedvalue,

author = {Jeff Schneider and Weng-Keen Wong and Andrew Moore and Martin Riedmiller},

title = {Distributed Value Functions},

booktitle = {In Proceedings of the Sixteenth International Conference on Machine Learning},

year = {1999},

pages = {371--378},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximize the sum of the rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of future rewards for all the nodes in the network. With this representation, each node can choose actions to improve the performance of the overall...

### Citations

3773 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...ntuitive sense of the kinds of applications we envision for our algorithms. 1.1 Related Work Reinforcement learning and dynamic programmingbased methods for optimal control are fairly well understood =-=[14, 4]-=-. By contrast distributed reinforcement learning is a much less mature concept because it is harder to formulate and analyze theoretically. One approach is to assume restricted interaction between the... |

1321 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ...ill you choose?" because neither of them has any representation of the other's states and actions. Despite this problem, reward and value functions are a universal language. Constructing a Q-lear=-=ning [16]-=- algorithm to solve the problem only requires nodes to communicate in these terms. A Q-learning rule that each node can implement to learn the value function in eq. 9 is: Q i (x i ; a i ) / (1 \Gamma ... |

498 | Markov games as a framework for multi-agent reinforcement learning
- Littman
- 1994
(Show Context)
Citation Context ...7] . That work has shared, global state and cost information, but the agents act independently. Other work provides a similar framework, but in the context of competing rather than cooperating agents =-=[9]. Homogene-=-ous agents that seek to learn similar value functions can improve learning speed and performance by exchanging learned policies and sensing [15] . Weiss proposes a "bucket brigade" scheme fo... |

457 | Dynamic Programming and Optimal Control, Athena Scientific, 3rd edition - Bertsekas - 2007 |

318 |
Dynamic programming and optimal control
- Bertsekas
(Show Context)
Citation Context ...ntuitive sense of the kinds of applications we envision for our algorithms. 1.1 Related Work Reinforcement learning and dynamic programmingbased methods for optimal control are fairly well understood =-=[14, 4]-=-. By contrast distributed reinforcement learning is a much less mature concept because it is harder to formulate and analyze theoretically. One approach is to assume restricted interaction between the... |

279 | Improving elevator performance using reinforcement learning
- Crites, Barto
- 1996
(Show Context)
Citation Context ...le to another [12]. The resulting algorithm requires some global information, but does provide a provably optimal solution. Distributed elevator control has been addressed with reinforcement learning =-=[7]-=- . That work has shared, global state and cost information, but the agents act independently. Other work provides a similar framework, but in the context of competing rather than cooperating agents [9... |

248 | Multi-agent reinforcement learning: Independent vs. cooperative agents
- Tan
- 1993
(Show Context)
Citation Context ...xt of competing rather than cooperating agents [9]. Homogeneous agents that seek to learn similar value functions can improve learning speed and performance by exchanging learned policies and sensing =-=[15] . Weiss p-=-roposes a "bucket brigade" scheme for credit assignment among cooperating reinforcement learning agents[17]. Rather than each agent acting locally and independently, they communicate globall... |

237 | Residual Algorithms: Reinforcement Learning with Function Approximation
- Baird
- 1995
(Show Context)
Citation Context ...r form, and to speed up learning through generalization in large state spaces. In some cases, the convergence proofs for reinforcement learning have been extended to the use of function approximation =-=[8, 2]-=-. In the case of local state for distributed RL, we can think of it exactly as value function approximation where each node has chosen differing sets of features (their own locally observable state) t... |

207 | Stable Function Approximation in Dynamic Programming
- Gordon
- 1995
(Show Context)
Citation Context ...r form, and to speed up learning through generalization in large state spaces. In some cases, the convergence proofs for reinforcement learning have been extended to the use of function approximation =-=[8, 2]-=-. In the case of local state for distributed RL, we can think of it exactly as value function approximation where each node has chosen differing sets of features (their own locally observable state) t... |

182 | Packet routing in dynamically changing networks: a reinforcement learning approach
- Boyan, Littman
- 1994
(Show Context)
Citation Context ...t will accomplish all goals [1]. Similarly, economic models can form the basis of credit assignment methods [3]. Packet routing is a domain for which completely distributed approaches have been taken =-=[5, 6, 13]-=-. In this problem, each node must make a decision about which neighboring node to rout each packet to. The global state space is the list of all packets in the system, their current locations, and the... |

74 | Ants and reinforcement learning: a case study in routing in dynamic networks
- Subramanian, Druschel, et al.
- 1997
(Show Context)
Citation Context ...t will accomplish all goals [1]. Similarly, economic models can form the basis of credit assignment methods [3]. Packet routing is a domain for which completely distributed approaches have been taken =-=[5, 6, 13]-=-. In this problem, each node must make a decision about which neighboring node to rout each packet to. The global state space is the list of all packets in the system, their current locations, and the... |

62 | How to dynamically merge Markov decision processes
- Singh, Cohn
- 1998
(Show Context)
Citation Context ...t all interesting problems. A completely segregated approach can be taken with the additional allowance that the choice of controls by one node may restrict the space of controls available to another =-=[12]-=-. The resulting algorithm requires some global information, but does provide a provably optimal solution. Distributed elevator control has been addressed with reinforcement learning [7] . That work ha... |

26 | Hosoda: Coordination of Multiple Behaviors Acquired by a Vision-Based Reinforcement
- Asada, Uchibe, et al.
- 1994
(Show Context)
Citation Context ...obal controls" in each state. Similarly, there are approaches where different behaviors are learned for a system and they are combined with the idea of choosing actions that will accomplish all g=-=oals [1]-=-. Similarly, economic models can form the basis of credit assignment methods [3]. Packet routing is a domain for which completely distributed approaches have been taken [5, 6, 13]. In this problem, ea... |

26 | Predictive Q-routing: a memory-based reinforcement learning approach to adaptive traffic control
- Choi, Yeung
- 1996
(Show Context)
Citation Context ...t will accomplish all goals [1]. Similarly, economic models can form the basis of credit assignment methods [3]. Packet routing is a domain for which completely distributed approaches have been taken =-=[5, 6, 13]-=-. In this problem, each node must make a decision about which neighboring node to rout each packet to. The global state space is the list of all packets in the system, their current locations, and the... |

18 | Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
- Schneider
- 1997
(Show Context)
Citation Context ...en strictly in terms of the value function. At first glance, it might seem that this will lead to the use of a kind of value iteration coupled with the learning of system models to solve the problem (=-=[10, 11]-=-, for example). This can't be done easily, though, because the use of local state means neighbors can't use state as a common language with which to communicate. It isn't possible for one node to ask ... |

17 | A.W.Moore. Value function based production scheduling
- Schneider
- 1998
(Show Context)
Citation Context ...en strictly in terms of the value function. At first glance, it might seem that this will lead to the use of a kind of value iteration coupled with the learning of system models to solve the problem (=-=[10, 11]-=-, for example). This can't be done easily, though, because the use of local state means neighbors can't use state as a common language with which to communicate. It isn't possible for one node to ask ... |

7 |
Distributed reinforcement learning
- Weiss
- 1995
(Show Context)
Citation Context ... improve learning speed and performance by exchanging learned policies and sensing [15] . Weiss proposes a "bucket brigade" scheme for credit assignment among cooperating reinforcement learn=-=ing agents[17]. Rather t-=-han each agent acting locally and independently, they communicate globally in an auction to arbitrate which subset of agents will be allowed to "take over the global controls" in each state.... |

1 |
Manifesto for an Evolutionary
- Baum
- 1998
(Show Context)
Citation Context ...haviors are learned for a system and they are combined with the idea of choosing actions that will accomplish all goals [1]. Similarly, economic models can form the basis of credit assignment methods =-=[3]-=-. Packet routing is a domain for which completely distributed approaches have been taken [5, 6, 13]. In this problem, each node must make a decision about which neighboring node to rout each packet to... |