Results 1  10
of
61
Learning to predict by the methods of temporal differences
 MACHINE LEARNING
, 1988
"... This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional predictionlearning methods assign credit by means of the difference between predi ..."
Abstract

Cited by 1246 (46 self)
 Add to MetaCart
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional predictionlearning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporaldifference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervisedlearning methods. For most realworld prediction problems, temporaldifference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporaldifference methods can be applied to advantage.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 179 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
On the complexity of solving Markov decision problems
 IN PROC. OF THE ELEVENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1995
"... Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argu ..."
Abstract

Cited by 131 (11 self)
 Add to MetaCart
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.
Contingent Planning Under Uncertainty via Stochastic Satisfiability
 Artificial Intelligence
, 1999
"... We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes t ..."
Abstract

Cited by 59 (8 self)
 Add to MetaCart
We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes the problem as an EMajsat instance, while zander encodes the problem as an SSat instance. Although SSat problems are in a higher complexity class than EMajsat problems, the problem encodings produced by zander are substantially more compact and appear to be easier to solve than the corresponding EMajsat encodings. Preliminary results for zander indicate that it is competitive with existing planners on a variety of problems. Introduction When planning under uncertainty, any information about the state of the world is precious. A contingent plan is one that can make action choices contingent on such information. In this paper, we present an implemented framework for contingent pl...
Errorresistant Implementation of DNA Computations
 In Second Annual Meeting on DNA Based Computers
"... This paper introduces a new model of computation that employs the tools of molecular biology whose in vitro implementation is far more errorresistant than extant proposals. We describe an abstraction of the model which lends itself to natural algorithmic description, particularly for problems in ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
This paper introduces a new model of computation that employs the tools of molecular biology whose in vitro implementation is far more errorresistant than extant proposals. We describe an abstraction of the model which lends itself to natural algorithmic description, particularly for problems in the complexity class NP . In addition we describe a number of lineartime algorithms within our model, particularly for NP complete problems. We describe an in vitro realisation of the model and conclude with a discussion of future work. 1 Introduction The idea that living cells and molecular complexes can be viewed as potential machinic components dates back to the late 1950s, when Richard Feynman delivered his famous paper [4] describing "submicroscopic" computers. More recently, several papers [1, 10, 16] (also see [7, 13]) have advocated the realisation of massively parallel computation using the techniques and chemistry of molecular biology. Adleman describes how a computational...
Optimal Processor Assignment for a Class of Pipelined Computations
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... The availability of large scale multitasked parallel architectures introduces the following processor assignment problem: we are given a long sequence of data sets, each of which is to undergo processing by a collection of tasks whose intertask data dependencies form a seriesparallel partial order ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem: we are given a long sequence of data sets, each of which is to undergo processing by a collection of tasks whose intertask data dependencies form a seriesparallel partial order. Each individual task is potentially parallelizable, with a known experimentally determined execution signature. Recognizing that data sets can be pipelined through the task structure, the problem is to find a "good" assignment of processors to tasks. Two objectives interest us: minimal response time per data set given a throughput requirement, and maximal throughput given a response time requirement. Our approach is to decompose a seriesparallel task system into its essential "serial" and "parallel" components; our problem admits the independent solution and recomposition of each such component. We provide algorithms for the series analysis, and use an algorithm due to Krishnamurti and Ma...
Determining all optimal and nearoptimal solutions when solving shortest path problems by dynamic programming
 Operations Research
, 1984
"... This paper presents a new algorithm for finding all solutions with objective function values in the neighborhood of the optimum for certain dynamic programming models, including shortest path problems. The new algorithm combines the depthfirst search with stacking techniques of theoretical comput ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper presents a new algorithm for finding all solutions with objective function values in the neighborhood of the optimum for certain dynamic programming models, including shortest path problems. The new algorithm combines the depthfirst search with stacking techniques of theoretical computer science and principles from dynamic programming to modify the usual backtracking routine and list all nearoptimal policies. The resulting procedure is the first practical algorithm for a variety of large problems that are of interest. HE ALGORITHM presented in this paper was motivated by a study T of the evolutionary distance problem in molecular biology. In this context, dynamic programming methods are used to investigate evolutionary relationships between two DNA sequences (Smith et al. [1981]). The specific sequences studied implied a network of approximately 2,200 nodes and 110,OOO arcs so that analysis by Kth shortest path methods was not practical. Details of this study have been published elsewhere (Waterman [1983]). Consider a directed acyclic network or, more generally, a network with no cycle whose length is nonpositive. A simple method is presented for finding all paths from node 1 to node N whose lengths are within a prescribed distance e(e 2 0) of the length of the shortest path(s) from node 1 to node N. The algorithm uses a pushdown (lastin, firstout) stack and has modest memory requirements. This new method is easy to understand and to code, which, with the memory requirements, accords it a special advantage over Kth shortest path calculations. See Dreyfus [ 19691 for a review of shortest path algorithms. To describe the new method let t(x, y) denote the length of arc (x, y) in the network. With f(N) = 0, let f(x) denote the length of the shortest Subject clrrssificatbn: 111 nearoptimal policies.
Optimal Processor Assignment for Pipeline Computations
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1994
"... The availability of largescale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual response times for different processor siz ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
The availability of largescale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual response times for different processor sizes, find an assignment of processors to tasks. Two objectives interest us: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, we assume that a large number of processors are to be assigned to a relatively small number of tasks. In this paper we develop efficient assignment algorithms for different classes of task structures. For a p processor system and a seriesparallel precedence graph with n constituent tasks, we provide an O(np2) algorithm that find the optimal assignment for the response time optimization problem; we find the assignment optimizing the constrained throughput in O(np2logp)time. Special cases of linear, independent, and tree graphs are also considered. In addition, we also examine more efficient algorithms when certain restrictions are placed on the problem parameters. Our techniques are applied to a task system in computer vision.
Complexity results for InfiniteHorizon Markov Decision Processes
, 2000
"... Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in a ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in artificial intelligence, has highlighted the significance of studying the computational properties of MDP problems. We address
A Generic Program for Sequential Decision Processes
 Programming Languages: Implementations, Logics, and Programs
, 1995
"... This paper is an attempt to persuade you of my viewpoint by presenting a novel generic program for a certain class of optimisation problems, named sequential decision processes. This class was originally identified by Richard Bellman in his pioneering work on dynamic programming [4]. It is a perfect ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper is an attempt to persuade you of my viewpoint by presenting a novel generic program for a certain class of optimisation problems, named sequential decision processes. This class was originally identified by Richard Bellman in his pioneering work on dynamic programming [4]. It is a perfect example of a class of problems which are very much alike, but which has until now escaped solution by a single program. Those readers who have followed some of the work that Richard Bird and I have been doing over the last five years [6, 7] will recognise many individual examples: all of these have now been unified. The point of this observation is that even when you are on the lookout for generic programs, it can take a rather long time to discover them. The presentation below will follow that earlier work, by referring to the calculus of relations and the relational theory of data types. I shall however attempt to be light on the formalism, as I do not regard it as essential to the main thesis of this paper. Undoubtedly there are other (perhaps more convenient) notations in which the same ideas could be developed. This paper does assume some degree of familiarity with a lazy functional programming language such as Haskell, Hope, Miranda