Results 1 
9 of
9
The Karmed Dueling Bandits Problem
"... We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., userperceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves (almost) informationtheoretically optimal regret bounds (up to a constant factor). 1
Characterizing truthful multiarmed bandit mechanisms
 In ACMEC
, 2009
"... We consider a multiround auction setting motivated by payperclick auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. I ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We consider a multiround auction setting motivated by payperclick auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer’s goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multiarmed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same“best”advertisement. We investigate how the design of multiarmed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties – essentially, they must separate exploration from exploitation – and they incur much higher regret than the optimal multiarmed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.
Sorting and Selection with Imprecise Comparisons
"... Abstract. In experimental psychology, the method of paired comparisons was proposed as a means for ranking preferences amongst n elements of a human subject. The method requires performing all ( n 2 comparisons then sorting elements according to the number of wins. The large number of comparisons i ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract. In experimental psychology, the method of paired comparisons was proposed as a means for ranking preferences amongst n elements of a human subject. The method requires performing all ( n 2 comparisons then sorting elements according to the number of wins. The large number of comparisons is performed to counter the potentially faulty decisionmaking of the human subject, who acts as an imprecise comparator. We consider a simple model of the imprecise comparisons: there exists some δ> 0 such that when a subject is given two elements to compare, if the values of those elements (as perceived by the subject) differ by at least δ, then the comparison will be made correctly; when the two elements have values that are within δ, the outcome of the comparison is unpredictable. This δ corresponds to the just noticeable difference unit (JND) or difference threshold in the psychophysics literature, but does not require the statistical assumptions used to define this value. In this model, the standard method of paired comparisons minimizes the errors introduced by the imprecise comparisons at the cost of ( n 2 comparisons. We show that the same optimal guarantees can be achieved using 4n 3/2 comparisons, and we prove the optimality of our method. We then explore the general tradeoff between the guarantees on the error that can be made and number of comparisons for the problems of sorting, maxfinding, and selection. Our results provide closetooptimal solutions for each of these problems. 1
An Optimal Policy for Target Localization with Application to Electron Microscopy
"... This paper considers the task of finding a target location by making a limited number of sequential observation. Each observation results from evaluating an imperfect classifier of a chosen cost and accuracy on an interval of chosen length and position. Within a Bayesian framework, we study the prob ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper considers the task of finding a target location by making a limited number of sequential observation. Each observation results from evaluating an imperfect classifier of a chosen cost and accuracy on an interval of chosen length and position. Within a Bayesian framework, we study the problem of minimizing an objective that combines the entropy of the posterior distribution with the cost of the questions asked. In this problem, we show that the onestep lookahead policy is Bayesoptimal for any arbitrary time horizon. Moreover, this onestep lookahead policy is easy to compute and implement. We then use this policy in the context of localizing mitochondria in electron microscope images, and experimentally show that significant speed ups in acquisition can be gained, while maintaining near equal image quality at target locations, when compared to current policies. Proceedings of the 30 th
NEW LEARNING FRAMEWORKS FOR INFORMATION RETRIEVAL
, 2011
"... Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will co ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will conceptually simplify the overall development process, making it easier to reason about higher level goals and properties of the retrieval system. This dissertation advocates two complementary approaches, structured prediction and interactive learning, to learn featurerich retrieval models that can perform well in practice.
The communication complexity of addition
, 2011
"... Suppose each of k ≤ no(1) players holds an nbit number xi in its hand. The players wish to determine if ∑ i≤k xi = s. We give a publiccoin protocol with error 1% and communication O(k lg k). The communication bound is independent of n, and for k ≥ 3 improves on the O(k lg n) bound by Nisan (Bolyai ..."
Abstract
 Add to MetaCart
Suppose each of k ≤ no(1) players holds an nbit number xi in its hand. The players wish to determine if ∑ i≤k xi = s. We give a publiccoin protocol with error 1% and communication O(k lg k). The communication bound is independent of n, and for k ≥ 3 improves on the O(k lg n) bound by Nisan (Bolyai Soc. Math. Studies; 1993). Our protocol also applies to addition modulo m. In this case we give a matching (publiccoin) Ω(k lg k) lower bound for various m. We also obtain some lower bounds over the integers, including Ω(k lg lg k) for protocols that are oneway, like ours. We give a protocol to determine if ∑ xi> s with error 1 % and communication O(k lg k) lg n. For k ≥ 3 this improves on Nisan’s O(k lg 2 n) bound. A similar improvement holds for computing degree(k − 1) polynomialthreshold functions in the numberonforehead model. We give a (publiccoin, 2player, tight) Ω(lg n) lower bound to determine if x1> x2. This improves on the Ω ( √ lg n) bound by Smirnov (1988).
Applied Probability Trust (25 February 2011) TWENTY QUESTIONS WITH NOISE: BAYES OPTIMAL POLICIES FOR ENTROPY LOSS
"... We consider the problem of 20 questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target’s location, we seek to minimi ..."
Abstract
 Add to MetaCart
We consider the problem of 20 questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target’s location, we seek to minimize the expected entropy of the posterior distribution. We formulate this problem as a dynamic program and show that any policy optimizing the onestep expected reduction in entropy is also optimal over the full horizon. Two such Bayesoptimal policies are presented: one generalizes the probabilistic bisection policy due to Horstein and the other asks a deterministic set of questions. We study the structural properties of the latter, and illustrate its use in a computer vision application.
A BAYESIAN APPROACH TO STOCHASTIC ROOT FINDING
"... A stylized model of onedimensional stochastic rootfinding involves repeatedly querying an oracle as to whether the root lies to the left or right of a given point x. The oracle answers this question, but the received answer is incorrect with probability 1 − p(x). A Bayesianstyle algorithm for thi ..."
Abstract
 Add to MetaCart
A stylized model of onedimensional stochastic rootfinding involves repeatedly querying an oracle as to whether the root lies to the left or right of a given point x. The oracle answers this question, but the received answer is incorrect with probability 1 − p(x). A Bayesianstyle algorithm for this problem that assumes knowledge of p(·) repeatedly updates a density giving, in some sense, one’s belief about the location of the root. We demonstrate how the algorithm works, and provide some results that shed light on its performance, both when p(·) is constant and when p(·) varies with x. 1
Noisy Search with Comparative Feedback
"... We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback ..."
Abstract
 Add to MetaCart
We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a target out of n items can be found in O(log n) queries. We also show the surprising result that for k possible answers per query, the speedup is not log k (as for kary search) but only log log k in some cases. 1