Results 1 - 10
of
1,740
Autonomous Learning of Reward Distribution in Not100 Game
"... In this paper, autonomous learning of reward distribution in multi-agent reinforcement learning was applied to the 4 player game named “not100”. In this game, more shrewd tactics to cooperate with the other agents is required for each agent than the other tasks that the learning was applied previous ..."
Abstract
- Add to MetaCart
In this paper, autonomous learning of reward distribution in multi-agent reinforcement learning was applied to the 4 player game named “not100”. In this game, more shrewd tactics to cooperate with the other agents is required for each agent than the other tasks that the learning was applied
Finite-time analysis of the multiarmed bandit problem
- Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract
-
Cited by 817 (15 self)
- Add to MetaCart
, and for all reward distributions with bounded support. Keywords: bandit problems, adaptive allocation rules, finite horizon regret 1.
Autonomous Learning of Reward Distribution for Each Agent
- in MultiAgent Reinforcement Learning”, Intelligent Autonomous Systems
, 2000
"... Abstract. A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some ca ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some
Multi-Armed Bandit Problems with Heavy Tail Reward Distributions
"... Abstract — In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and ex ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
-generating functions of the arm reward distributions are properly bounded, the optimal logarithmic order of the regret can be achieved by DSEE. The condition on the reward distributions can be gradually relaxed at a cost of a higher (nevertheless, sublinear) regret order: for any positive integer p, O(T 1/p) regret
Policy gradient methods for reinforcement learning with function approximation.
- In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract
-
Cited by 439 (20 self)
- Add to MetaCart
). With function approximation, two ways of formulating the agent's objective are useful. One is the average reward formulation, in which policies are ranked according to their long-term expected reward per step, ρ(π): where d π (s) = lim t→∞ P r {s t = s|s 0 , π} is the stationary distribution of states
Monte-Carlo Tree Search in Poker using Expected Reward Distributions
"... Abstract. We investigate the use of Monte-Carlo Tree Search (MCTS) within the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability d ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
distribution in these nodes is modeled by an opponent model that predicts the actions of the opponents. We propose a modification of the standard MCTS selection and backpropagation strategies that explicitly model and exploit the uncertainty of sampled expected values. The new strategies are evaluated as a
A framework for mesencephalic dopamine systems based on predictive Hebbian learning
- J. Neurosci
, 1996
"... We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how fl ..."
Abstract
-
Cited by 385 (33 self)
- Add to MetaCart
We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how
Performance-Based Reward Distribution Methods for Anonymous Decision-Making Groups
"... Research has shown that both the support of anonymity and the use of appropriate incentives can lead to improved group performance. Anonymity enables a more open discussion resulting in a more critical analysis of a problem. Rewards can motivate individuals to cooperate, giving them the incentive to ..."
Abstract
- Add to MetaCart
and performance-based rewards. Mechanisms based on public key encryption technologies are presented which make it possible to distribute individual rewards to anonymous contributors, guarantee that only the contributor can claim a reward for her contribution, verify that a reward has been distributed, and be able
Behavioral responses to inequity in reward distribution and working effort in crows and ravens
- PloS One
, 2013
"... Sensitivity to inequity is considered to be a crucial cognitive tool in the evolution of human cooperation. The ability has recently been shown also in primates and dogs, raising the question of an evolutionary basis of inequity aversion. We present first evidence that two bird species are sensitive ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
are sensitive to other individuals ’ efforts and payoffs. In a token exchange task we tested both behavioral responses to inequity in the quality of reward (preferred versus non-preferred food) and to the absence of reward in the presence of a rewarded partner, in 5 pairs of corvids (6 crows, 4 ravens). Birds
Monte-Carlo Tree Search in Poker using Expected Reward Distributions ∗
"... Poker playing computer bots can be divided into two categories. There are the game-theoretic bots, that play according to a strategy that gives rise to a Nash equilibrium. These bots are impossible to beat, but are also not able to exploit non-optimalities in their opponents. The other type of bot i ..."
Abstract
- Add to MetaCart
Poker playing computer bots can be divided into two categories. There are the game-theoretic bots, that play according to a strategy that gives rise to a Nash equilibrium. These bots are impossible to beat, but are also not able to exploit non-optimalities in their opponents. The other type of bot is the exploiting bot that employs game tree search and opponent modeling techniques to discover and exploit weaknesses of
Results 1 - 10
of
1,740