Results 1  10
of
77
Symbolic Dynamic Programming for Firstorder MDPs
 In IJCAI
, 2001
"... We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and p ..."
Abstract

Cited by 134 (4 self)
 Add to MetaCart
We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of firstorder formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decisiontheoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form. 1
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract

Cited by 102 (1 self)
 Add to MetaCart
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
State Abstraction for Programmable Reinforcement Learning Agents
 In Proceedings of the Eighteenth National Conference on Artificial Intelligence
, 2002
"... Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where learned behaviors must conform to a given partial, hierarchical program. Unlike previous approaches to this problem, our methods yield significant state abstraction while maintaining hierarchical optimality, i.e., optimality among all policies consistent with the partial program. We show how to achieve this for a partial programming language that is essentially Lisp augmented with nondeterministic constructs. We demonstrate our methods on two variants of Dietterich's taxi domain, showing how state abstraction and hierarchical optimality result in faster learning of better policies and enable the transfer of learned skills from one problem to another.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 63 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
IBAL: A Probabilistic Rational Programming Language
 In Proc. 17th IJCAI
, 2001
"... In a rational programming language, a program specifies a situation faced by an agent; evaluating the program amounts to computing what a rational agent would believe or do in the situation. This paper presents IBAL, a rational programming language for probabilistic and decisiontheoretic agen ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
In a rational programming language, a program specifies a situation faced by an agent; evaluating the program amounts to computing what a rational agent would believe or do in the situation. This paper presents IBAL, a rational programming language for probabilistic and decisiontheoretic agents. IBAL provides a rich declarative language for describing probabilistic models. The expression language allows the description of arbitrarily complex generative models. In addition, IBAL's observation language makes it possible to express and compose rejective models that result from conditioning on the observations. IBAL also integrates Bayesian parameter estimation and decisiontheoretic utility maximization thoroughly into the framework. All these are packaged together into a programming language that has a rich type system and builtin extensibility. This paper presents a detailed account of the syntax and semantics of IBAL, as well as an overview of the implementation. 1
Hybrid BDIPOMDP framework for multiagent teaming
 JAIR
"... Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDIPOMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Rolebased Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our
Observations and the Probabilistic Situation Calculus
, 2002
"... In this article we propose a Probabilistic Situation Calculus logical language to represent and reason with knowledge about dynamical worlds in which actions have uncertain effects. Two essential tasks are addressed when reasoning about change in worlds: Probabilistic Temporal Projection and Probabi ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
In this article we propose a Probabilistic Situation Calculus logical language to represent and reason with knowledge about dynamical worlds in which actions have uncertain effects. Two essential tasks are addressed when reasoning about change in worlds: Probabilistic Temporal Projection and Probabilistic Belief Update. Uncertain effects are modeled by dividing an action into two subparts: a deterministic input (agent produced) and a probabilistic reaction (nature produced). The probability distributions of the reactions are assumed to be known. Our logical language is an extension to Situation Calculae in the style proposed by Raymond Reiter. There are three aspects to this work. First, we extend the language to accommodate terms dealing with belief and probability. Second, we provide a operational semantics based on Randomly Timed Automata. Finally, we develop MonteCarlo algorithms to efficiently interpret the probability and belief terms. With the framework proposed we discuss how to develop a reasoning system in Mathematica capable of performing temporal projection and belief update in the Probabilistic Situation Calculus. Finally, we present a sound basis to set rewards and observation planning. (1) Center for Logic and Computation, Departamento de Matematica, IST, Av. Rovisco Pais, 1049001 Lisboa, Portugal. email: pmat@math.ist.utl.pt. Supported by FCT SFRH/BPD/5625/2001 and the FibLog initiative. (2) Applied Mathematics Center, Departamento de Matematica, IST, Av. Rovisco Pais, 1049001 Lisboa, Portugal. email: apacheco@math.ist.utl.pt (3) Unfortunately J. Pinto passed away in an accident while this paper was being prepared. Formerly, he was at Bell Labs, Database Systems Research Dept., 600 Mountain Ave., New Jersey 07974, U.S.A. OBSERVATIONS AND THE P...
Online DecisionTheoretic Golog for Unpredictable Domains
"... DTGolog was proposed by Boutilier et al. as an integration of decisiontheoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of D ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
DTGolog was proposed by Boutilier et al. as an integration of decisiontheoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of DTGolog, where a program is executed online and DT planning can be applied to parts of a program only. One of the limitations is that DT planning generally cannot be applied to programs containing sensing actions. In order to deal with robotic scenarios in unpredictable domains, where certain kinds of sensing like measuring one's own position are ubiquitous, we propose a strategy where sensing during deliberation is replaced by suitable models like computed trajectories so that DT planning remains applicable. In the paper we discuss the necessary changes to DTGolog entailed by this strategy and an application of our approach in the ROBOCUP domain.
Turning HighLevel Plans into Robot Programs in Uncertain Domains
 In ECAI'2000
, 2000
"... . The actions of a robot like lifting an object are often best thought of as lowlevel processes with uncertain outcome. A highlevel robot plan can be seen as a description of a task which combines these processes in an appropriate way and which may involve nondeterminism in order to increase a plan ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
. The actions of a robot like lifting an object are often best thought of as lowlevel processes with uncertain outcome. A highlevel robot plan can be seen as a description of a task which combines these processes in an appropriate way and which may involve nondeterminism in order to increase a plan's generality. In a given situation, a robot needs to turn a given plan into an executable program for which it can establish, through some form of projection, that it satisfies a given goal with some probability. In this paper we will show how this can be achieved in a logical framework. In particular, lowlevel processes are modelled as programs in pGOLOG, a probabilistic variant of the action language GOLOG. Highlevel plans are like ordinary GOLOG programs except that during projection the names of lowlevel processes are replaced by their pGOLOG definitions. 1 Introduction The actions of a robot like lifting an object are often best thought of as lowlevel processes with uncertain ou...
Practical solution techniques for firstorder mdps
 Artificial Intelligence
"... Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a firstorder MDP (FOMDP) specification and then derives a domainindependent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proofofconcept results of our firstorder approximate linear programming (FOALP) planner on problems from the probabilistic track