Results

**11 - 18**of**18**### Competitive Collaborative Learning

"... Intuitively, it is clear that trust or shared taste enables a community of users to make better decisions over time, by learning cooperatively and avoiding one another’s mistakes. However, it is also clear that the presence of malicious, dishonest users in the community threatens the usefulness of s ..."

Abstract
- Add to MetaCart

Intuitively, it is clear that trust or shared taste enables a community of users to make better decisions over time, by learning cooperatively and avoiding one another’s mistakes. However, it is also clear that the presence of malicious, dishonest users in the community threatens the usefulness of such collaborative learning processes. We investigate this issue by developing algorithms for a multi-user online learning problem in which each user makes a sequence of decisions about selecting products or resources. Our model, which generalizes the adversarial multi-armed bandit problem, is characterized by two key features: (1) The quality of the products or resources may vary over time. (2) Some of the users in the system may be dishonest, Byzantine agents. Decision problems with these features underlie applications such as reputation and recommendation systems in e-commerce, and resource location systems in peer-topeer networks. Assuming the number of honest users is at least a constant fraction of the number of resources, and that the honest users can be partitioned into groups such that individuals in a group make identical assessments of resources, we present an algorithm whose expected regret per user is linear in the number of groups and only logarithmic in the number of resources. This bound compares favorably with the naïve approach in which each user ignores feedback from peers and chooses resources using a multi-armed bandit algorithm; in this case the expected regret per user would be polynomial in the number of resources.

### Dynamic Selfish Routing and Traffic Optimisation

, 2008

"... This work surveys results from [18,19,20,21,22,23]. Recently, the Wardrop model has attracted a lot of attention as a model of selfish behaviour in routing scenarios. In this model, an infinite number of users controls an infinite amount of flow each. The overall flow induces latencies on the edges ..."

Abstract
- Add to MetaCart

This work surveys results from [18,19,20,21,22,23]. Recently, the Wardrop model has attracted a lot of attention as a model of selfish behaviour in routing scenarios. In this model, an infinite number of users controls an infinite amount of flow each. The overall flow induces latencies on the edges, and agents strive to minimise their sustained latency selfishly. Most of the studies on this model focus on static properties of equilibria like bounds on the degradation of performance due to the selfishness of the agents and the absence of central coordination, and ways to reduce this degradation. In order to motivate the study of equilibria, one typically uses strong assumptions on the knowledge and rationality of the agents. In this work, we take a different approach by modelling the agents as a dynamic population of agents exchanging their routing path from time to time in order to improve their sustained latency. Our motivation is twofold. First, our results show that it is possible for a population of agents to attain an equilibrium without relying on the abovementioned assumptions, merely by following some simple selfish improvement rules. Second, we use these results to obtain fast distributed algorithms for computing approximate Wardrop equilibria. Our analysis are mainly theoretical, but we also present simulations of dynamic traffic engineering protocols based on our population dynamics.

### A Novel Protocol for Communicating Reputation in P2P Networks

"... Abstract. Many reputation systems mainly focus on avoiding untrustworthy agents by communicating reputation. Here arises the problem that when an agent is not ignorant of another then there is no way to notice ambiguity. This paper shows a new protocol in which an agent can measure ambiguity using t ..."

Abstract
- Add to MetaCart

Abstract. Many reputation systems mainly focus on avoiding untrustworthy agents by communicating reputation. Here arises the problem that when an agent is not ignorant of another then there is no way to notice ambiguity. This paper shows a new protocol in which an agent can measure ambiguity using the notion of statistics, and illustrates the method of designing agents ’ algorithms as well as existing reputation systems. 1

### Multi-Armed Bandits with Betting

"... We study an extension to the stochastic multiarmed bandit problem where the learner has a budget of K “coins ” it can use in each round. The learner can use the coins to play multiple arms in each round, having the option to “bet ” multiple coins on an arm. At the end of the round, the arms generate ..."

Abstract
- Add to MetaCart

We study an extension to the stochastic multiarmed bandit problem where the learner has a budget of K “coins ” it can use in each round. The learner can use the coins to play multiple arms in each round, having the option to “bet ” multiple coins on an arm. At the end of the round, the arms generate a reward that is proportional to the amount of coins invested in them. 1.

### General Terms

"... An attacker can draw attention to items that don’t deserve that attention by manipulating recommender systems. We describe an influence-limiting algorithm that can turn existing recommender systems into manipulation-resistant systems. Honest reporting is the optimal strategy for raters who wish to m ..."

Abstract
- Add to MetaCart

An attacker can draw attention to items that don’t deserve that attention by manipulating recommender systems. We describe an influence-limiting algorithm that can turn existing recommender systems into manipulation-resistant systems. Honest reporting is the optimal strategy for raters who wish to maximize their influence. If an attacker can create only a bounded number of shills, the attacker can mislead only a small amount. However, the system eventually makes full use of information from honest, informative raters. We describe both the influence limits and the information loss incurred due to those limits in terms of information-theoretic concepts of loss functions and entropies.

### Gossip-based distributed stochastic bandit algorithms

"... The multi-armed bandit problem has attracted remarkable attention in the machine learning community and many efficient algorithms have been proposed to handle the so-called exploitationexploration dilemma in various bandit setups. At the same time, significantly less effort has been devoted to adapt ..."

Abstract
- Add to MetaCart

The multi-armed bandit problem has attracted remarkable attention in the machine learning community and many efficient algorithms have been proposed to handle the so-called exploitationexploration dilemma in various bandit setups. At the same time, significantly less effort has been devoted to adapting bandit algorithms to particular architectures, such as sensor networks, multi-core machines, or peer-to-peer (P2P) environments, which could potentially speed up their convergence. Our goal is to adapt stochastic bandit algorithms to P2P networks. In our setup, the same set of arms is available in each peer. In every iteration each peer can pull one arm independently of the other peers, and then some limited communication is possible with a few random other peers. As our main result, we show that our adaptation achieves a linear speedup in terms of the number of peers participating in the network. More precisely, we show that the probability of playing a suboptimal arm at a peer in iteration t = Ω(log N) is proportional to 1/(Nt) where N denotes the number of peers. The theoretical results are supported by simulation experiments showing that our algorithm scales gracefully with the size of network. Proceedings of the 30 th