Results 1  10
of
127
Symbolic Dynamic Programming for Firstorder MDPs
 In IJCAI
, 2001
"... We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and p ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of firstorder formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decisiontheoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form. 1
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract

Cited by 115 (1 self)
 Add to MetaCart
(Show Context)
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
State Abstraction for Programmable Reinforcement Learning Agents
 In Proceedings of the Eighteenth National Conference on Artificial Intelligence
, 2002
"... Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where ..."
Abstract

Cited by 99 (3 self)
 Add to MetaCart
Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where learned behaviors must conform to a given partial, hierarchical program. Unlike previous approaches to this problem, our methods yield significant state abstraction while maintaining hierarchical optimality, i.e., optimality among all policies consistent with the partial program. We show how to achieve this for a partial programming language that is essentially Lisp augmented with nondeterministic constructs. We demonstrate our methods on two variants of Dietterich's taxi domain, showing how state abstraction and hierarchical optimality result in faster learning of better policies and enable the transfer of learned skills from one problem to another.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
IBAL: A Probabilistic Rational Programming Language
 In Proc. 17th IJCAI
, 2001
"... In a rational programming language, a program specifies a situation faced by an agent; evaluating the program amounts to computing what a rational agent would believe or do in the situation. This paper presents IBAL, a rational programming language for probabilistic and decisiontheoretic agen ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
In a rational programming language, a program specifies a situation faced by an agent; evaluating the program amounts to computing what a rational agent would believe or do in the situation. This paper presents IBAL, a rational programming language for probabilistic and decisiontheoretic agents. IBAL provides a rich declarative language for describing probabilistic models. The expression language allows the description of arbitrarily complex generative models. In addition, IBAL's observation language makes it possible to express and compose rejective models that result from conditioning on the observations. IBAL also integrates Bayesian parameter estimation and decisiontheoretic utility maximization thoroughly into the framework. All these are packaged together into a programming language that has a rich type system and builtin extensibility. This paper presents a detailed account of the syntax and semantics of IBAL, as well as an overview of the implementation. 1
Efficient utility functions for ceteris paribus preferences
 In Proceedings of the Eighteenth National Conference on Artificial Intelligence
, 2002
"... Although ceteris paribus preference statements concisely represent one natural class of preferences over outcomes or goals, many applications of such preferences require numeric utility function representations to achieve computational efficiency. We provide algorithms, complete for finite universes ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
Although ceteris paribus preference statements concisely represent one natural class of preferences over outcomes or goals, many applications of such preferences require numeric utility function representations to achieve computational efficiency. We provide algorithms, complete for finite universes of binary features, for converting a set of qualitative ceteris paribus preferences into quantitative utility functions.
Learning to Parse Natural Language Commands to a Robot Control System
"... Abstract As robots become more ubiquitous and capable of performing complex tasks, the importance of enabling untrained users to interact with them has increased. In response, unconstrained naturallanguage interaction with robots has emerged as a significant research area. We discuss the problem of ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
(Show Context)
Abstract As robots become more ubiquitous and capable of performing complex tasks, the importance of enabling untrained users to interact with them has increased. In response, unconstrained naturallanguage interaction with robots has emerged as a significant research area. We discuss the problem of parsing natural language commands to actions and control structures that can be readily implemented in a robot execution system. Our approach learns a parser based on example pairs of English commands and corresponding control language expressions. We evaluate this approach in the context of following route instructions through an indoor environment, and demonstrate that our system can learn to translate English commands into sequences of desired actions, while correctly capturing the semantic intent of statements involving complex control structures. The procedural nature of our formal representation allows a robot to interpret route instructions online while moving through a previously unknown environment. 1
Hybrid BDIPOMDP framework for multiagent teaming
 JAIR
"... Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
(Show Context)
Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDIPOMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Rolebased Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our
Observations and the Probabilistic Situation Calculus
, 2002
"... In this article we propose a Probabilistic Situation Calculus logical language to represent and reason with knowledge about dynamical worlds in which actions have uncertain effects. Two essential tasks are addressed when reasoning about change in worlds: Probabilistic Temporal Projection and Probabi ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
In this article we propose a Probabilistic Situation Calculus logical language to represent and reason with knowledge about dynamical worlds in which actions have uncertain effects. Two essential tasks are addressed when reasoning about change in worlds: Probabilistic Temporal Projection and Probabilistic Belief Update. Uncertain effects are modeled by dividing an action into two subparts: a deterministic input (agent produced) and a probabilistic reaction (nature produced). The probability distributions of the reactions are assumed to be known. Our logical language is an extension to Situation Calculae in the style proposed by Raymond Reiter. There are three aspects to this work. First, we extend the language to accommodate terms dealing with belief and probability. Second, we provide a operational semantics based on Randomly Timed Automata. Finally, we develop MonteCarlo algorithms to efficiently interpret the probability and belief terms. With the framework proposed we discuss how to develop a reasoning system in Mathematica capable of performing temporal projection and belief update in the Probabilistic Situation Calculus. Finally, we present a sound basis to set rewards and observation planning. (1) Center for Logic and Computation, Departamento de Matematica, IST, Av. Rovisco Pais, 1049001 Lisboa, Portugal. email: pmat@math.ist.utl.pt. Supported by FCT SFRH/BPD/5625/2001 and the FibLog initiative. (2) Applied Mathematics Center, Departamento de Matematica, IST, Av. Rovisco Pais, 1049001 Lisboa, Portugal. email: apacheco@math.ist.utl.pt (3) Unfortunately J. Pinto passed away in an accident while this paper was being prepared. Formerly, he was at Bell Labs, Database Systems Research Dept., 600 Mountain Ave., New Jersey 07974, U.S.A. OBSERVATIONS AND THE P...
Online DecisionTheoretic Golog for Unpredictable Domains
"... DTGolog was proposed by Boutilier et al. as an integration of decisiontheoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of D ..."
Abstract

Cited by 27 (11 self)
 Add to MetaCart
DTGolog was proposed by Boutilier et al. as an integration of decisiontheoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of DTGolog, where a program is executed online and DT planning can be applied to parts of a program only. One of the limitations is that DT planning generally cannot be applied to programs containing sensing actions. In order to deal with robotic scenarios in unpredictable domains, where certain kinds of sensing like measuring one's own position are ubiquitous, we propose a strategy where sensing during deliberation is replaced by suitable models like computed trajectories so that DT planning remains applicable. In the paper we discuss the necessary changes to DTGolog entailed by this strategy and an application of our approach in the ROBOCUP domain.