Results 11  20
of
508
Plan stability: Replanning versus plan repair
 In Proc. ICAPS
, 2006
"... The ultimate objective in planning is to construct plans for execution. However, when a plan is executed in a real environment it can encounter differences between the expected and actual context of execution. These differences can manifest as divergences between the expected and observed states of ..."
Abstract

Cited by 76 (4 self)
 Add to MetaCart
The ultimate objective in planning is to construct plans for execution. However, when a plan is executed in a real environment it can encounter differences between the expected and actual context of execution. These differences can manifest as divergences between the expected and observed states of the world, or as a change in the goals to be achieved by the plan. In both cases, the old plan must be replaced with a new one. In replacing the plan an important consideration is plan stability. We compare two alternative strategies for achieving the stable repair of a plan: one is simply to replan from scratch and the other is to adapt the existing plan to the new context. We present arguments to support the claim that plan stability is a valuable property. We then propose an implementation, based on LPG, of a plan repair strategy that adapts a plan to its new context. We demonstrate empirically that our plan repair strategy achieves more stability than replanning and can produce repaired plans more efficiently than replanning. 1
Contingent Planning Under Uncertainty via Stochastic Satisfiability
 Artificial Intelligence
, 1999
"... We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes t ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
(Show Context)
We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes the problem as an EMajsat instance, while zander encodes the problem as an SSat instance. Although SSat problems are in a higher complexity class than EMajsat problems, the problem encodings produced by zander are substantially more compact and appear to be easier to solve than the corresponding EMajsat encodings. Preliminary results for zander indicate that it is competitive with existing planners on a variety of problems. Introduction When planning under uncertainty, any information about the state of the world is precious. A contingent plan is one that can make action choices contingent on such information. In this paper, we present an implemented framework for contingent pl...
Knows What It Knows: A Framework For SelfAware Learning
"... We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract

Cited by 67 (20 self)
 Add to MetaCart
We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcementlearning and activelearning problems. We catalog several KWIKlearnable classes and open problems. 1.
Dynamic Programming for POMDPs using a Factored State Representation
 In Proceedings of the Fifth International Conference on AI Planning Systems
, 2000
"... Contingent planning  constructing a plan in which action selection is contingent on imperfect information received during plan execution  can be formalized as the problem of solving a partially observable Markov decision process (POMDP). Traditional dynamic programming algorithms for POMDPs ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
(Show Context)
Contingent planning  constructing a plan in which action selection is contingent on imperfect information received during plan execution  can be formalized as the problem of solving a partially observable Markov decision process (POMDP). Traditional dynamic programming algorithms for POMDPs use a flat state representation that enumerates all possible states and state transitions. By contrast, AI planning algorithms use a factored state representation that supports state abstraction and allows problems with large state spaces to be represented and solved more efficiently. Boutilier and Poole (1996) have recently described how a factored state representation can be exploited by a dynamic programming algorithm for POMDPs. We extend their framework, describe an implementation and test its performance, and assess how much this approach improves the computational efficiency of dynamic programming for POMDPs. Introduction Many AI planning researchers have adopted Markov...
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the moveme ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Efficient structure learning in factoredstate MDPs
, 2007
"... We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns the DBN structures as part of the reinforcementlearning process and provably provides an efficient learning algorithm when combined with factored Rmax.
Sunny: A new algorithm for trust inference in social networks using probabilistic confidence models
 In Proceedings of the National Conference on Artificial Intelligence (AAAI
, 2007
"... In many computing systems, information is produced and processed by many people. Knowing how much a user trusts a source can be very useful for aggregating, filtering, and ordering of information. Furthermore, if trust is used to support decision making, it is important to have an accurate estimate ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
In many computing systems, information is produced and processed by many people. Knowing how much a user trusts a source can be very useful for aggregating, filtering, and ordering of information. Furthermore, if trust is used to support decision making, it is important to have an accurate estimate of trust when it is not directly available, as well as a measure of confidence in that estimate. This paper describes a new approach that gives an explicit probabilistic interpretation for confidence in social networks. We describe SUNNY, a new trust inference algorithm that uses a probabilistic sampling technique to estimate our confidence in the trust information from some designated sources. SUNNY computes an estimate of trust based on only those information sources with high confidence estimates. In our experiments, SUNNY produced more accurate trust estimates than the well known trust inference algorithm TIDALTRUST (Golbeck 2005), demonstrating its effectiveness.
Effective approaches for partial satisfaction (oversubscription) planning
 in: Proceedings of AAAI04
, 2004
"... In many real world planning scenarios, agents often do not have enough resources to achieve all of their goals. Consequently, they are forced to find plans that satisfy only a subset of the goals. Solving such partial satisfaction planning (PSP) problems poses several challenges, including an increa ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
In many real world planning scenarios, agents often do not have enough resources to achieve all of their goals. Consequently, they are forced to find plans that satisfy only a subset of the goals. Solving such partial satisfaction planning (PSP) problems poses several challenges, including an increased emphasis on modeling and handling plan quality (in terms of action costs and goal utilities). Despite the ubiquity of such PSP problems, very little attention has been paid to them in the planning community. In this paper, we start by describing a spectrum of PSP problems and focus on one of the more general PSP problems, termed PSP NET BENEFIT. We develop three techniques, (i) one based on integer programming, called OptiPlan, (ii) the second based on regression planning with reachability heuristics, called AltAlt ps, and (iii) the third based on anytime heuristic search for a forward statespace heuristic planner, called Sapa ps. Our empirical studies with these planners show that the heuristic planners generate plans that are comparable to the quality of plans generated by OptiPlan, while incurring only a small fraction of the cost.
Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback
 Proc. 18th International Joint Conf. on Artificial Intelligence
, 2003
"... Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introdu ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introduce and analyze three new HS/DP algorithms.