Results 1  10
of
612,315
Efficient Optimal Learning for Contextual Bandits
"... We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal policy ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 433 (7 self)
 Add to MetaCart
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs
Learnability in Optimality Theory
, 1995
"... In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given gr ..."
Abstract

Cited by 528 (34 self)
 Add to MetaCart
efficient convergence to a correct grammar. We discuss implications for learning from overt data only, as well as other learning issues. We argue that Optimality Theory promotes confluence of the demands of more effective learnability and deeper linguistic explanation.
Active Learning with Statistical Models
, 1995
"... For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statist ..."
Abstract

Cited by 677 (12 self)
 Add to MetaCart
, statisticallybased learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.
On the impossibility of informationally efficient markets
 AMERICAN ECONOMIC REVIEW
, 1980
"... ..."
Learning probabilistic relational models
 In IJCAI
, 1999
"... A large portion of realworld data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract

Cited by 619 (31 self)
 Add to MetaCart
of the dependency structure in a model. Moreover, we show how the learning procedure can exploit standard database retrieval techniques for efficient learning from large datasets. We present experimental results on both real and synthetic relational databases. 1
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 707 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new
A learning algorithm for Boltzmann machines
 Cognitive Science
, 1985
"... The computotionol power of massively parallel networks of simple processing elements resides in the communication bandwidth provided by the hardware connections between elements. These connections con allow a significant fraction of the knowledge of the system to be applied to an instance of a probl ..."
Abstract

Cited by 586 (13 self)
 Add to MetaCart
to a general learning rule for modifying the connection strengths so as to incorporate knowledge obout o task domain in on efficient way. We describe some simple examples in which the learning algorithm creates internal representations thot ore demonstrobly the most efficient way of using
Results 1  10
of
612,315