Results 1  10
of
20
DSybil: Optimal SybilResistance for Recommendation Systems
, 2009
"... Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation sys ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation systems makes it even harder. This paper presents DSybil, a novel defense for diminishing the influence of sybil identities in recommendation systems. DSybil provides strong provable guarantees that hold even under the worstcase attack and are optimal. DSybil can defend against an unlimited number of sybil identities over time. DSybil achieves its strong guarantees by i) exploiting the heavytail distribution of the typical voting behavior of the honest identities, and ii) carefully identifying whether the system is already getting “enough help ” from the (weighted) voters already taken into account or whether more “help ” is needed. Our evaluation shows that DSybil would continue to provide highquality recommendations even when a millionnode botnet uses an optimal strategy to launch a sybil attack. 1.
The Karmed Dueling Bandits Problem
"... We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., userperceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves (almost) informationtheoretically optimal regret bounds (up to a constant factor). 1
Contextual Bandits with Similarity Information
 24TH ANNUAL CONFERENCE ON LEARNING THEORY
, 2011
"... In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the contextarm pairs which bounds from above the difference between the respective expected payoffs. Prior work
Regret bounds for gaussian process bandit problems
 In AISTATS
, 2010
"... Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no nois ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds. 1
An asymptotically optimal bandit algorithm for bounded support models
 In Proceedings of the Twentythird Conference on Learning Theory (COLT 2010
, 2010
"... Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playing a slot machine with multiple arms. We study stochastic bandit problem where each arm has a reward distribution support ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playing a slot machine with multiple arms. We study stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0, 1]. In this model, Auer et al. (2002) proposed practical policies called UCB and derived finitetime regret of UCB policies. However, policies achieving the asymptotic bound given by Burnetas and Katehakis (1996) have been unknown for the model. We propose Deterministic Minimum Empirical Divergence (DMED) policy and prove that DMED achieves the asymptotic bound. Furthermore, the index used in DMED for choosing an arm can be computed easily by a convex optimization technique. Although we do not derive a finitetime regret, we confirm by simulations that DMED achieves a regret close to the asymptotic bound in finite time. 1
Improved algorithms for linear stochastic bandits
 In Advances in Neural Information Processing Systems, 2011. version 1  13 Jan 2012 Supplementary Material: Bandit
"... We improve the theoretical analysis and empirical performance of algorithms for the stochastic multiarmed bandit problem and the linear stochastic multiarmed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability consta ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multiarmed bandit problem and the linear stochastic multiarmed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vectorvalued martingales. 1
Mortal MultiArmed Bandits
"... We formulate and study a new variant of the karmed bandit problem, motivated by ecommerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard karmed bandit model in wh ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We formulate and study a new variant of the karmed bandit problem, motivated by ecommerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard karmed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with nearcertainty. The main motivation for our setting is onlineadvertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budgets. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the typical ad lifetime. We present an optimal algorithm for the stateaware (deterministic reward function) case, and build on this technique to obtain an algorithm for the stateoblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a realworld ad serving application, show that the proposed algorithms significantly outperform the standard multiarmed bandit approaches applied to these settings. 1
Using More Data to Speedup Training Time
"... In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtim ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtime. In this paper, we study the runtime of learning as a function of the number of available training examples, and underscore the main highlevel techniques. We provide the first formal positive result showing that even in the unrealizable case, the runtime can decrease exponentially while only requiring a polynomial growth of the number of examples. Our construction corresponds to a synthetic learning problem and an interesting open question is whether the tradeoff can be shown for more natural learning problems. We spell out several interesting candidates of natural learning problems for which we conjecture that there is a tradeoff between computational and sample complexity. 1
Learning hurdles for sleeping experts
 In Innovations in Theoretical Computer Science
, 2012
"... We study the online decision problem where the set of available actions varies over time, also called the sleeping experts problem. We consider the setting where the performance comparison is made with respect to the best ordering of actions in hindsight. In this paper, both the payoff function and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We study the online decision problem where the set of available actions varies over time, also called the sleeping experts problem. We consider the setting where the performance comparison is made with respect to the best ordering of actions in hindsight. In this paper, both the payoff function and the availability of actions is adversarial. Kleinberg et al. (2008) gave a computationally efficient noregret algorithm in the setting where payoffs are stochastic. Kanade et al. (2009) gave an efficient noregret algorithm in the setting where action availability is stochastic. However, the question of whether there exists a computationally efficient noregret algorithm in the adversarial setting was posed as an open problem by Kleinberg et al. (2008). We show that such an algorithm would imply an algorithm for PAC learning DNF, a long standing important open problem. We also consider the setting where the number of available actions is restricted, and study its relation to agnostic learning monotone disjunctions over examples with bounded Hamming weight. 1
Can we learn to gamble efficiently
 In COLT, 2010. Open Problem
"... Betting is an important problem faced by millions of sports fans each day. Presented with an upcoming matchup between team A and team B, and given the opportunity to place a 50/50 wager on either, where should a gambler put her money? This decision is not, of course, made in isolation: both teams wi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Betting is an important problem faced by millions of sports fans each day. Presented with an upcoming matchup between team A and team B, and given the opportunity to place a 50/50 wager on either, where should a gambler put her money? This decision is not, of course, made in isolation: both teams will have played a number of decided matches with other teams throughout the season.