Results 1  10
of
65
Relational Reinforcement Learning
, 2001
"... Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learni ..."
Abstract

Cited by 138 (9 self)
 Add to MetaCart
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.
Coastal Navigation with Mobile Robots
, 2000
"... The problem that we address in this paper is how a mobile robot can plan in order to arrive at its goal with minimum uncertainty. Traditional motion planning algorithms often assume that a mobile robot can track its position reliably, however, in real world situations, reliable localization may not ..."
Abstract

Cited by 92 (19 self)
 Add to MetaCart
The problem that we address in this paper is how a mobile robot can plan in order to arrive at its goal with minimum uncertainty. Traditional motion planning algorithms often assume that a mobile robot can track its position reliably, however, in real world situations, reliable localization may not always be feasible. Partially Observable Markov Decision Processes (POMDPs) provide one way to maximize the certainty of reaching the goal state, but at the cost of computational intractability for large state spaces. The method we propose explicitly models the uncertainty of the robot’s position as a state variable, and generates trajectories through the augmented poseuncertainty space. By minimizing the positional uncertainty at the goal, the robot reduces the likelihood it becomes lost. We demonstrate experimentally that coastal navigation reduces the uncertainty at the goal, especially with degraded localization.
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
(Show Context)
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Autonomous shaping: knowledge transfer in reinforcement learning
 In Int. Conference on Machine Learning
, 2006
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Agentcentered search
 Articial Intelligence Magazine
"... In this article, we describe agentcentered search (sometimes also called realtime search or local search) and illustrate this planning paradigm with examples. Agentcentered search methods interleave planning and plan execution and restrict planning to the part of the domain around the current sta ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
(Show Context)
In this article, we describe agentcentered search (sometimes also called realtime search or local search) and illustrate this planning paradigm with examples. Agentcentered search methods interleave planning and plan execution and restrict planning to the part of the domain around the current state of the agent, for example, the current location of a mobile robot or the current board position of a game. They can execute actions in the presence of time constraints and often have a small sum of planning and execution cost, both because they tradeoff planning and execution cost and because they allow agents to gather information early in nondeterministic domains, which reduces the amount of planning they have to perform for unencountered situations. These advantages become important as more intelligent systems are interfaced
Exploration of MultiState Environments: Local Measures and BackPropagation of Uncertainty
, 1998
"... . This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a ..."
Abstract

Cited by 52 (1 self)
 Add to MetaCart
. This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular congurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and backpropagating them with the dynamic programming or temporal dierence mechanisms. This allows reproducing globalscale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the eciency of these propositions. Keywords: ...
Reinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis
"... Editor: We study the problem of learning nearoptimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PACMDP ” algorithms include the wellknown E 3 and RMAX algorithms as well as the more recent Delayed Qlearning algorithm. We summarize the current ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
(Show Context)
Editor: We study the problem of learning nearoptimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PACMDP ” algorithms include the wellknown E 3 and RMAX algorithms as well as the more recent Delayed Qlearning algorithm. We summarize the current stateoftheart by presenting bounds for the problem in a unified theoretical framework. We also present a more refined analysis that yields insight into the differences between the modelfree Delayed Qlearning and the modelbased RMAX. Finally, we conclude with open problems.
Autoexploratory Average Reward Reinforcement Learning
 Artificial Intelligence
, 1996
"... We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores the unexp ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Autoexploratory Hlearning" performs better than the original Hlearning under previously studied exploration methods such as random, recencybased, or counterbased exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Qlearning (Watkins and Dayan 92) and Adaptive RealTime Dynamic Programming (ARTDP) (Barto, Bradtke, & Singh 95), optimize the total discounted reward the ...
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are know ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policya mapping of observations into actionsbased on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multiagent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience reuse. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Potentialbased shaping and Qvalue initialization are equivalent
 Journal of Artificial Intelligence Research
"... ..."