Results 1 - 10
of
10,504
The Exploration-Exploitation Dilemma for Adaptive Agents
- Proceedings of the Fifth European Workshop on Adaptive Agents and Multi-Agent Systems
, 2005
"... Learning agents have to deal with the exploration-exploitation dilemma. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Learning agents have to deal with the exploration-exploitation dilemma.
The two facets of the exploration-exploitation dilemma
- In IAT’06
, 2006
"... This paper proposes an algorithm to better solve the exploration-exploitation dilemma faced by model-less reinforcement learning agents. The main contribution is twofold: (1) The two facets of the exploration-exploitation dilemma are distinguished: in some cases, the agent faces a non-stationary env ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper proposes an algorithm to better solve the exploration-exploitation dilemma faced by model-less reinforcement learning agents. The main contribution is twofold: (1) The two facets of the exploration-exploitation dilemma are distinguished: in some cases, the agent faces a non
The exploration–exploitation dilemma: a multidisciplinary framework
- PLoS ONE 9
, 2014
"... The trade-off between the need to obtain new knowledge and the need to use that knowledge to improve performance is one of the most basic trade-offs in nature, and optimal performance usually requires some balance between exploratory and exploitative behaviors. Researchers in many disciplines have b ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
been searching for the optimal solution to this dilemma. Here we present a novel model in which the exploration strategy itself is dynamic and varies with time in order to optimize a definite goal, such as the acquisition of energy, money, or prestige. Our model produced four very distinct phases
Computational, Neuroscientific, and Lifespan Perspectives on the Exploration-Exploitation Dilemma
"... neuroscience, information search Consider the following real-life decisions that we make: deciding which route to take home to minimize time spent traveling, choosing amongst a set of known restaurants or a new restaurant when dining out, deciding between reading a new book by a consistently good au ..."
Abstract
- Add to MetaCart
author versus an author whose books vary widely in quality. All of these decisions involve balancing the conflicting demands of exploiting previous knowledge in order maximize payoffs versus exploring less-known options in order to gain information about the currently optimal course of action. Indeed
Human Collective Intelligence under Dual Exploration- Exploitation Dilemmas
"... The exploration-exploitation dilemma is a recurrent adaptive problem for humans as well as non-human animals. Given a fixed time/energy budget, every individual faces a fundamental trade-off between exploring for better resources and exploiting known resources to optimize overall performance under u ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The exploration-exploitation dilemma is a recurrent adaptive problem for humans as well as non-human animals. Given a fixed time/energy budget, every individual faces a fundamental trade-off between exploring for better resources and exploiting known resources to optimize overall performance under
Learning and choosing in an uncertain world: An investigation of the explore-exploit dilemma in static
"... and dynamic environments ..."
1 Supplementary Material: Humans use directed and random exploration to solve the exploration-exploitation dilemma
"... Full instructions for the task Before taking part in the experiment, participants read a set of illustrated onscreen instructions describing the task and its mechanics. Here we provide the full text for these instructions. Each bullet point corresponds to a single screen in the instructions. In the ..."
Abstract
- Add to MetaCart
Full instructions for the task Before taking part in the experiment, participants read a set of illustrated onscreen instructions describing the task and its mechanics. Here we provide the full text for these instructions. Each bullet point corresponds to a single screen in the instructions. In the interests of space we have not included the illustrations themselves. • Welcome! Thank you for volunteering for this experiment. • In this experiment we would like you to choose between two one-armed bandits of the sort you might find in a casino. • The one-armed bandits will be represented like this • Every time you choose to play a particular bandit, the lever will be pulled like this... •... and the payoff will be shown like this. For example, in this case, the left bandit has been played and is paying out 77 points. • Each bandit tends to pay out about the same amount of reward on average, but there is variability in the reward on any given play. • For example, the average reward for the bandit on the right might be 50 points, but on the
Finite-time analysis of the multiarmed bandit problem
- Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract
-
Cited by 817 (15 self)
- Add to MetaCart
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1714 (25 self)
- Add to MetaCart
of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state
TABU SEARCH
"... Tabu Search is a metaheuristic that guides a local heuristic search procedure to explore the solution space beyond local optimality. One of the main components of tabu search is its use of adaptive memory, which creates a more flexible search behavior. Memory based strategies are therefore the hallm ..."
Abstract
-
Cited by 822 (48 self)
- Add to MetaCart
Tabu Search is a metaheuristic that guides a local heuristic search procedure to explore the solution space beyond local optimality. One of the main components of tabu search is its use of adaptive memory, which creates a more flexible search behavior. Memory based strategies are therefore
Results 1 - 10
of
10,504