Results 1 - 10
of
301
The dynamics of reinforcement learning in cooperative multiagent systems
- In Proceedings of National Conference on Artificial Intelligence (AAAI-98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract
-
Cited by 249 (1 self)
- Add to MetaCart
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium. 1
Reaching Agreements Through Argumentation: A Logical Model and Implementation
- Artificial Intelligence
, 1998
"... In a multi-agent environment, where self-motivated agents try to pursue their own goals, cooperation cannot be taken for granted. Cooperation must be planned for and achieved through communication and negotiation. We present a logical model of the mental states of the agents based on a representatio ..."
Abstract
-
Cited by 189 (9 self)
- Add to MetaCart
In a multi-agent environment, where self-motivated agents try to pursue their own goals, cooperation cannot be taken for granted. Cooperation must be planned for and achieved through communication and negotiation. We present a logical model of the mental states of the agents based on a representation of their beliefs, desires, intentions, and goals. We present argumentation as an iterative process emerging from exchanges among agents to persuade each other and bring about a change in intentions. We look at argumentation as a mechanism for achieving cooperation and agreements. Using categories identified from human multi-agent negotiation, we demonstrate how the logic can be used to specify argument formulation and evaluation. We also illustrate how the developed logic can be used to describe different types of agents. Furthermore, we present a general Automated Negotiation Agent which we implemented, based on the logical model. Using this system, a user can analyze and explore differe...
A Survey of Models of Network Formation: Stability and Efficiency
, 2003
"... I survey the recent literature on the formation of networks. I provide definitions of network games, a number of examples of models from the literature, and discuss some of what is known about the (in)compatibility of overall societal welfare with individual incentives to form and sever links. ..."
Abstract
-
Cited by 133 (11 self)
- Add to MetaCart
I survey the recent literature on the formation of networks. I provide definitions of network games, a number of examples of models from the literature, and discuss some of what is known about the (in)compatibility of overall societal welfare with individual incentives to form and sever links.
The Independent Choice Logic for modelling multiple agents under uncertainty
- Artificial Intelligence
, 1997
"... Inspired by game theory representations, Bayesian networks, influence diagrams, structured Markov decision process models, logic programming, and work in dynamical systems, the independent choice logic (ICL) is a semantic framework that allows for independent choices (made by various agents, includi ..."
Abstract
-
Cited by 119 (6 self)
- Add to MetaCart
Inspired by game theory representations, Bayesian networks, influence diagrams, structured Markov decision process models, logic programming, and work in dynamical systems, the independent choice logic (ICL) is a semantic framework that allows for independent choices (made by various agents, including nature) and a logic program that gives the consequence of choices. This representation can be used as a specification for agents that act in a world, make observations of that world and have memory, as well as a modelling tool for dynamic environments with uncertainty. The rules specify the consequences of an action, what can be sensed and the utility of outcomes. This paper presents a possible-worlds semantics for ICL, and shows how to embed influence diagrams, structured Markov decision processes, and both the strategic (normal) form and extensive (game-tree) form of games within the Thanks to Craig Boutilier and Holger Hoos for detailed comments on this paper. This work was supporte...
Architecting Noncooperative Networks
- IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
, 1995
"... In noncooperative networks users make control decisions that optimize their own performance measure. Focusing on routing, we devise two methodologies for architecting noncooperative networks, that improve the overall network performance. These methodologies are motivated by problem settings arising ..."
Abstract
-
Cited by 111 (16 self)
- Add to MetaCart
In noncooperative networks users make control decisions that optimize their own performance measure. Focusing on routing, we devise two methodologies for architecting noncooperative networks, that improve the overall network performance. These methodologies are motivated by problem settings arising in the provisioning and the run time phases of the network. For either phase, Nash equilibria characterize the operating point of the network. The goal of the provisioning phase is to allocate link capacities that lead to systemwide efficient Nash equilibria. In general, the solution of such design problems is counterintuitive, since adding link capacity might lead to a degradation of user performance. We show that, for systems of parallel links, such paradoxes cannot occur and the optimal solution coincides with the solution in the single-user case. We derive some extensions to general network topologies. During the run time phase, a manager controls the routing of part of the network flow. The manager is aware of the noncooperative behavior of the users and makes its routing decisions based on this information while aiming at improving the overall system performance. We obtain necessary and sufficient conditions for enforcing an equilibrium that coincides with the global systemwide optimum, and indicate that these conditions are met in many cases of interest.
Achieving Network Optima Using Stackelberg Routing Strategies
, 1997
"... In noncooperative networks users make control decisions that optimize their individual performance objectives. Nash equilibria characterize the operating points of such networks. Nash equilibria are generically inefficient and exhibit suboptimal network performance. Focusing on routing, a methodolog ..."
Abstract
-
Cited by 83 (13 self)
- Add to MetaCart
In noncooperative networks users make control decisions that optimize their individual performance objectives. Nash equilibria characterize the operating points of such networks. Nash equilibria are generically inefficient and exhibit suboptimal network performance. Focusing on routing, a methodology is devised for overcoming this deficiency, through the intervention of the network manager. The manager controls part of the network flow, is aware of the noncooperative behavior of the users and performs its routing aiming at improving the overall system performance. The existence of maximally efficient strategies for the manager, i.e., strategies that drive the system into the global network optimum, is investigated. A maximally efficient strategy of the manager not only optimizes the overall performance of the network, but also induces an operating point that is efficient with respect to the performance of the individual users (Pareto efficiency). Necessary and sufficient conditions for...
Sequential auctions for the allocation of resources with complementarities
, 1999
"... Market-based mechanisms such as auctions are being studied as an appropriate means for resource allocation in distributed and inultiagcnl decision problems. When agents value resources in combination rather than in isolation, one generally relies on combinatorial auctions where agents bid tor resour ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
Market-based mechanisms such as auctions are being studied as an appropriate means for resource allocation in distributed and inultiagcnl decision problems. When agents value resources in combination rather than in isolation, one generally relies on combinatorial auctions where agents bid tor resource bundles. or simultaneous auctions for all resources. We develop a different model, where agents bid for required resources sequentially. This model has the advantage that it can be applied in settings where combinatorial and simultaneous models are infeasible (e.g.. when resources are made available at different points in time by different parties), as well as certain benefits in settings where combinatorial models are applicable. We develop a dynamic programming model tor agents to compute bidding policies based on estimated distributions over prices. We also describe how these distributions are updated to provide a learning model for bidding behavior. 1
Planning, learning and coordination in multiagent decision processes
- In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96
, 1996
"... There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniq ..."
Abstract
-
Cited by 72 (1 self)
- Add to MetaCart
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special n-person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation. 1
On the existence of equilibria in noncooperative optimal flow control
- Journal of the ACM
, 1995
"... Abstract. The existence of Nash equilibria in noncooperative flow control in a general productform network shared by K users is investigated. The performance objective of each user is to maximize its average throughput subject to an upper bound on its average time-delay. Previous attempts to study e ..."
Abstract
-
Cited by 67 (10 self)
- Add to MetaCart
Abstract. The existence of Nash equilibria in noncooperative flow control in a general productform network shared by K users is investigated. The performance objective of each user is to maximize its average throughput subject to an upper bound on its average time-delay. Previous attempts to study existence of equilibria for this flow control model were not successful, partly because the time-delay constraints couple the strategy spaces of the individual users in a way that does not allow the application of standard equilibrmm existence theorems from the game theory literature. To overcome this difficulty, a more general approach to study the existence of Nash equilibria for decentralized control schemes is introduced. This approach is based on directly proving the existence of a fixed point of the best reply correspondence of the underlying game. For the investigated flow control model, the best reply correspondence is shown to be a function, implicitly defined by means of K interdependent linear programs. Employing an appropriate definition for continuity of the set of optimal solutions of parametrized linear programs, it is shown that, under appropriate conditions, the best reply function is continuous. Brouwer’s theorem implies, then, that the best reply function has a fixed point.
Using Similarity Criteria to Make Issue Trade-Offs in Automated Negotiations
- Artificial Intelligence
, 2002
"... Automated negotiation is a key form of interaction in systems that are composed of multiple autonomous agents. The aim of such interactions is to reach agreements through an iterative process of making offers. The content of such proposals are, however, a function of the strategy of the agents. Here ..."
Abstract
-
Cited by 66 (7 self)
- Add to MetaCart
Automated negotiation is a key form of interaction in systems that are composed of multiple autonomous agents. The aim of such interactions is to reach agreements through an iterative process of making offers. The content of such proposals are, however, a function of the strategy of the agents. Here we present a strategy called the trade-off strategy where multiple negotiation decision variables are traded-off against one another (e.g., paying a higher price in order to obtain an earlier delivery date or waiting longer in order to obtain a higher quality service). Such a strategy is commonly known to increase the social welfare of agents. Yet, to date, most computational work in this area has ignored the issue of trade-offs, instead aiming to increase social welfare through mechanism design. The aim of this paper is to develop a heuristic computational model of the trade-off strategy and show that it can lead to an increased social welfare of the system. A novel linear algorithm is presented that enables software agents to make trade-offs for multi-dimensional goods for the problem of distributed resource allocation.

