Results 1  10
of
33
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Automatic basis function construction for approximate dynamic programming and reinforcement learning
 In Cohen and Moore (2006
, 2006
"... We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a highdimensional state space to a lowdimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lowerdimensional space. These are added as new features for the linear function approximator. This approach is applied to a highdimensional inventory control problem. 1.
Explanationbased learning and reinforcement learning: A unified view
 In Proceedings Twelfth International Conference on Machs’ne Learning
, 1995
"... Abstract. In speeduplearning problems, where full descriptions of operators are known, both explanationbased learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
Abstract. In speeduplearning problems, where full descriptions of operators are known, both explanationbased learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a statebystate basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a regionbyregion basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call regionbased dynamic programming or ExplanationBased Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of pointbased dynamic programming and to standard EBL. The results show that regionbased dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.
Towards a Unified Theory of State Abstraction for MDPs
 In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics
, 2006
"... extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A nu ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A number of abstractions have been proposed and studied in the reinforcementlearning and planning literatures, and positive and negative results are known. We provide a unified treatment of state abstraction for Markov decision processes. We study five particular abstraction schemes, some of which have been proposed in the past in di#erent forms, and analyze their usability for planning and learning.
A unified analysis of valuefunctionbased reinforcementlearning algorithms. Neural Computation
, 1997
"... Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend p ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend prior analyses of reinforcementlearning algorithms and present a powerful new theorem that can provide a unified analysis of valuefunctionbased reinforcementlearning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcementlearning algorithm to be proven by verifying that a Himplcr HynchronouH algorithm convergeH. \Ve illuHtrate the application of the theorem by analyzing the convergence of Qlearningl modelbased reinforcement learning, Qlearning with multistate updates, Qlearning for:\farkov games, and risksensitive reinforcement learning. 1
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
The Dynamic Assignment Problem
, 2004
"... There has been considerable recent interest in the dynamic vehicle routing problem, but the complexities of this problem class have generally restricted research to myopic models. In this paper, we address the simpler dynamic assignment problem, where a resource (container, vehicle, or driver) can s ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
There has been considerable recent interest in the dynamic vehicle routing problem, but the complexities of this problem class have generally restricted research to myopic models. In this paper, we address the simpler dynamic assignment problem, where a resource (container, vehicle, or driver) can serve only one task at a time. We propose a very general class of dynamic assignment models, and propose an adaptive, nonmyopic algorithm that involves iteratively solving sequences of assignment problems no larger than what would be required of a myopic model. We consider problems where the attribute space of future resources and tasks is small enough to be enumerated, and propose a hierarchical aggregation strategy for problems where the attribute spaces are too large to be enumerated. Finally, we use the formulation to also test the value of advance information, which offers a more realistic estimate over studies that use purely myopic models.
Spatial Aggregation: Modeling and controlling physical fields
 In Proceedings of Qualitative Reasoning Workshop
, 1997
"... Many important physical phenomena, such as temperature distribution, air flow, and acoustic waves, are described as continuous, distributed parameter fields. Controlling and optimizing these physical processes and systems are common design tasks in many scientific and engineering domains. However ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Many important physical phenomena, such as temperature distribution, air flow, and acoustic waves, are described as continuous, distributed parameter fields. Controlling and optimizing these physical processes and systems are common design tasks in many scientific and engineering domains. However, the challenges are multifold: distributed fields are conceptually harder to reason about than lumped parameter models; computational methods are prohibitively expensive for complex spatial domains; the underlying physics imposes severe constraints on observability and controllability. This paper develops an ontological abstraction and an aggregationdisaggregation mechanism, in a framework collectively known as spatial aggregation (SA), for reasoning about and synthesizing distributed control schemes for physical fields. The ontological abstraction models physical fields as networks of spatial objects. The aggregationdisaggregation mechanism employs a set of data types and generic...
General Dynamic Programming Algorithms Applied To Polling Systems
 COMMUNICATIONS IN STATISTICS: STOCHASTIC MODELS
, 1998
"... We formulate the problem of scheduling a single server in a multiclass queueing system as a Markov decision process under the discounted cost and the average cost criteria. We develop a new implementation of the modified policy iteration (MPI) dynamic programming algorithm to efficiently solve prob ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We formulate the problem of scheduling a single server in a multiclass queueing system as a Markov decision process under the discounted cost and the average cost criteria. We develop a new implementation of the modified policy iteration (MPI) dynamic programming algorithm to efficiently solve problems with large state spaces and small action spaces. This implementation has an enhanced policy evaluation (PE) step and an adaptive termination test. To numerically evaluate various solution approaches, we implemented value iteration and forms of modified policy iteration, and we further developed and implemented aggregationdisaggregation based (replacement process decomposition and groupscaling) algorithms appropriate to controlled queueing system models. Tests provide evidence that MPI outperforms the other algorithms for both the discounted cost and the average cost optimal polling problems. In light of the complexity of implementation for the aggregationdisaggregation based algorithm...
Hierarchical Knowledge Gradient for Sequential Sampling
"... We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional ve ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional vector of categorical and numerical attributes and has independent normal rewards. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. We propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement. This approach greatly reduces the measurement effort required, but it requires some prior knowledge on the smoothness of the function in the form of an aggregation function and computational issues limit the number of alternatives that can be easily considered to the thousands. We prove that our policy is consistent, finding a globally optimal alternative when given enough measurements, and show through simulations that it performs competitively with or significantly better than other policies.