• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,740
Next 10 →

Autonomous Learning of Reward Distribution in Not100 Game

by Tsutomu Masaki Masanori Sugisaka
"... In this paper, autonomous learning of reward distribution in multi-agent reinforcement learning was applied to the 4 player game named “not100”. In this game, more shrewd tactics to cooperate with the other agents is required for each agent than the other tasks that the learning was applied previous ..."
Abstract - Add to MetaCart
In this paper, autonomous learning of reward distribution in multi-agent reinforcement learning was applied to the 4 player game named “not100”. In this game, more shrewd tactics to cooperate with the other agents is required for each agent than the other tasks that the learning was applied

Finite-time analysis of the multiarmed bandit problem

by Peter Auer, Paul Fischer, Jyrki Kivinen - Machine Learning , 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract - Cited by 817 (15 self) - Add to MetaCart
, and for all reward distributions with bounded support. Keywords: bandit problems, adaptive allocation rules, finite horizon regret 1.

Autonomous Learning of Reward Distribution for Each Agent

by Katsunari Shibata, Koji Ito - in MultiAgent Reinforcement Learning”, Intelligent Autonomous Systems , 2000
"... Abstract. A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some ca ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some

Multi-Armed Bandit Problems with Heavy Tail Reward Distributions

by Keqin Liu, Qing Zhao
"... Abstract — In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and ex ..."
Abstract - Cited by 9 (7 self) - Add to MetaCart
-generating functions of the arm reward distributions are properly bounded, the optimal logarithmic order of the regret can be achieved by DSEE. The condition on the reward distributions can be gradually relaxed at a cost of a higher (nevertheless, sublinear) regret order: for any positive integer p, O(T 1/p) regret

Policy gradient methods for reinforcement learning with function approximation.

by Richard S Sutton , David Mcallester , Satinder Singh , Yishay Mansour - In NIPS, , 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract - Cited by 439 (20 self) - Add to MetaCart
). With function approximation, two ways of formulating the agent's objective are useful. One is the average reward formulation, in which policies are ranked according to their long-term expected reward per step, ρ(π): where d π (s) = lim t→∞ P r {s t = s|s 0 , π} is the stationary distribution of states

Monte-Carlo Tree Search in Poker using Expected Reward Distributions

by Guy Van Den Broeck, Kurt Driessens, Jan Ramon
"... Abstract. We investigate the use of Monte-Carlo Tree Search (MCTS) within the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability d ..."
Abstract - Cited by 18 (1 self) - Add to MetaCart
distribution in these nodes is modeled by an opponent model that predicts the actions of the opponents. We propose a modification of the standard MCTS selection and backpropagation strategies that explicitly model and exploit the uncertainty of sampled expected values. The new strategies are evaluated as a

A framework for mesencephalic dopamine systems based on predictive Hebbian learning

by P. Read Montague, Peter Dayan, Terrence J. Sejnowskw - J. Neurosci , 1996
"... We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how fl ..."
Abstract - Cited by 385 (33 self) - Add to MetaCart
We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how

Performance-Based Reward Distribution Methods for Anonymous Decision-Making Groups

by B. Gavish, J. Kalvenes
"... Research has shown that both the support of anonymity and the use of appropriate incentives can lead to improved group performance. Anonymity enables a more open discussion resulting in a more critical analysis of a problem. Rewards can motivate individuals to cooperate, giving them the incentive to ..."
Abstract - Add to MetaCart
and performance-based rewards. Mechanisms based on public key encryption technologies are presented which make it possible to distribute individual rewards to anonymous contributors, guarantee that only the contributor can claim a reward for her contribution, verify that a reward has been distributed, and be able

Behavioral responses to inequity in reward distribution and working effort in crows and ravens

by Claudia A. F. Wascher, Thomas Bugnyar - PloS One , 2013
"... Sensitivity to inequity is considered to be a crucial cognitive tool in the evolution of human cooperation. The ability has recently been shown also in primates and dogs, raising the question of an evolutionary basis of inequity aversion. We present first evidence that two bird species are sensitive ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
are sensitive to other individuals ’ efforts and payoffs. In a token exchange task we tested both behavioral responses to inequity in the quality of reward (preferred versus non-preferred food) and to the absence of reward in the presence of a rewarded partner, in 5 pairs of corvids (6 crows, 4 ravens). Birds

Monte-Carlo Tree Search in Poker using Expected Reward Distributions ∗

by Guy Van, Broeck Kurt, Driessens Jan Ramon
"... Poker playing computer bots can be divided into two categories. There are the game-theoretic bots, that play according to a strategy that gives rise to a Nash equilibrium. These bots are impossible to beat, but are also not able to exploit non-optimalities in their opponents. The other type of bot i ..."
Abstract - Add to MetaCart
Poker playing computer bots can be divided into two categories. There are the game-theoretic bots, that play according to a strategy that gives rise to a Nash equilibrium. These bots are impossible to beat, but are also not able to exploit non-optimalities in their opponents. The other type of bot is the exploiting bot that employs game tree search and opponent modeling techniques to discover and exploit weaknesses of
Next 10 →
Results 1 - 10 of 1,740
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University