Results 1  10
of
14
On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems
 Artificial Intelligence
, 2003
"... Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world, namely that its information is complete and correct, and that the results of its actions are deterministic and known.
Learning Accuracy and Availability of Humans who Help Mobile Robots
"... When mobile robots perform tasks in environments with humans, it seems appropriate for the robots to rely on such humans for help instead of dedicated human oracles or supervisors. However, these humans are not always available nor always accurate. In this work, we consider human help to a robot as ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
When mobile robots perform tasks in environments with humans, it seems appropriate for the robots to rely on such humans for help instead of dedicated human oracles or supervisors. However, these humans are not always available nor always accurate. In this work, we consider human help to a robot as concretely providing observations about the robot’s state to reduce state uncertainty as it executes its policy autonomously. We model the probability of receiving an observation from a human in terms of their availability and accuracy by introducing Human Observation Providers POMDPs (HOPPOMDPs). We contribute an algorithm to learn human availability and accuracy online while the robot is executing its current task policy. We demonstrate that our algorithm is effective in approximating the true availability and accuracy of humans without depending on oracles to learn, thus increasing the tractability of deploying a robot that can occasionally ask for help.
Averagereward decentralized markov decision processes
, 2007
"... Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multiagent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a mor ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multiagent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, showing that it is NP complete and that optimal policies are deterministic. Our analysis lays the foundation for designing two optimal algorithms. Experimental results with a standard problem from the literature illustrate the applicability of these solution techniques. 1
On policy iteration as a Newton’s method and polynomial policy iteration algorithms
, 2002
"... Policy iteration is a popular technique for solving Markov decision processes (MDPs). It is easy to describe and implement, and has excellent performance in practice. But not much is known about its complexity. The best upper bound remains exponential, and the best lower bound is a trivial Ω(n) on t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Policy iteration is a popular technique for solving Markov decision processes (MDPs). It is easy to describe and implement, and has excellent performance in practice. But not much is known about its complexity. The best upper bound remains exponential, and the best lower bound is a trivial Ω(n) on the number of iterations, where n is the number of states. This paper improves the upper bounds to a polynomial for policy iteration on MDP problems with special graph structure. Our analysis is based on the connection between policy iteration and Newton’s method for finding the zero of a convex function. The analysis offers an explanation as to why policy iteration is fast. It also leads to polynomial bounds on several variants of policy iteration for MDPs for which the linear programming formulation requires at most two variables per inequality (MDP(2)). The MDP(2) class includes deterministic MDPs under discounted and average reward criteria. The bounds on the run times include O(mn 2 log m log W) on MDP(2) and O(mn 2 log m) for deterministic MDPs, where m denotes the number of actions and W denotes the magnitude of the largest number in the problem description. 1
Active Reinforcement Learning
"... When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and pot ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems. 1.
A risksensitive approach to total productive maintenance
 Automatica
"... www.elsevier.com/locate/automatica While risksensitive (RS) approaches for designing plans of total productive maintenance are critical in manufacturing systems, there is little in the literature by way of theoretical modeling. Developing such plans often requires the solution of a discretetime st ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
www.elsevier.com/locate/automatica While risksensitive (RS) approaches for designing plans of total productive maintenance are critical in manufacturing systems, there is little in the literature by way of theoretical modeling. Developing such plans often requires the solution of a discretetime stochastic controloptimization problem. Renewal theory and Markov decision processes (MDPs) are commonly employed tools for solving the underlying problem. The literature on preventive maintenance, for the most part, focuses on minimizing the expected net cost, and disregards issues related to minimizing risks. RS maintenance managers employ safety factors to modify the riskneutral solution in an attempt to heuristically accommodate elements of risk in their decision making. In this paper, our efforts are directed toward developing a formal theory for developing RS preventivemaintenance plans. We employ the Markowitz paradigm in which one seeks to optimize a function of the expected cost and its variance. In particular, we present (i) a result for an RS approach in the setting of renewal processes and (ii) a result for solving an RS MDP. We also provide computational results to demonstrate the efficacy of these results. Finally, the theory developed here is of sufficiently general nature that can be applied to problems in other relevant domains.
Discounted deterministic Markov decision processes and discounted allpairs shortest paths
 ACM Transcations on Algorithms
"... We present two new algorithms for finding optimal strategies for discounted, infinitehorizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n 2 log n) time, where n ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present two new algorithms for finding optimal strategies for discounted, infinitehorizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n 2 log n) time, where n is the number of vertices (or states) and m is the number of edges (or actions). The second one is an adaptation of a classical algorithm of Karp for finding minimum mean weight cycles. It runs in O(mn) time. The first algorithm has a slightly slower worstcase complexity, but is faster than the first algorithm in many situations. Both algorithms improve on a recent O(mn 2)time algorithm of Andersson and Vorobyov. We also present a randomized Õ(m1/2 n 2)time algorithm for finding Discounted AllPairs Shortest Paths (DAPSP), improving several previous algorithms. 1
Purpose Restrictions on Information Use ⋆
"... Abstract. Privacy policies in sectors as diverse as Web services, finance and healthcare often place restrictions on the purposes for which a governed entity may use personal information. Thus, automated methods for enforcing privacy policies require a semantics of purpose restrictions to determine ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. Privacy policies in sectors as diverse as Web services, finance and healthcare often place restrictions on the purposes for which a governed entity may use personal information. Thus, automated methods for enforcing privacy policies require a semantics of purpose restrictions to determine whether a governed agent used information for a purpose. We provide such a semantics using a formalism based on planning. We model planning using Partially Observable Markov Decision Processes (POMDPs), which supports an explicit model of information. We argue that information use is for a purpose if and only if the information is used while planning to optimize the satisfaction of that purpose under the POMDP model. We determine information use by simulating ignorance of the information prohibited by the purpose restriction, which we relate to noninterference. We use this semantics to develop a sound audit algorithm to automate the enforcement of purpose restrictions. 1
PartiallySynchronized DECMDPs in Dynamic Mechanism Design
"... In this paper, we combine for the first time the methods of dynamic mechanism design with techniques from decentralized decision making under uncertainty. Consider a multiagent system with selfinterested agents acting in an uncertain environment, each with private actions, states and rewards. Ther ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this paper, we combine for the first time the methods of dynamic mechanism design with techniques from decentralized decision making under uncertainty. Consider a multiagent system with selfinterested agents acting in an uncertain environment, each with private actions, states and rewards. There is also a social planner with its own actions, rewards, and states, acting as a coordinator and able to influence the agents via actions (e.g., resource allocations). Agents can only communicate with the center, but may become inaccessible, e.g., when their communication device fails. When accessible to the center, agents can report their local state (and models) and receive recommendations from the center about local policies to follow for the present period and also, should they become inaccessible, until becoming accessible again. Without selfinterest, this poses a new problem class which we call partiallysynchronized DECMDPs, and for which we establish some positive complexity results under reasonable assumptions. Allowing for selfinterested agents, we are able to bridge to methods of dynamic mechanism design, aligning incentives so that agents truthfully report local state when accessible and choose to follow the prescribed “emergency policies ” of the center.
An Approximate Algorithm for Solving Oracular POMDPs
"... Abstract — We propose a new approximate algorithm, LAJIV (Lookahead JMDP Information Value), to solve Oracular Partially Observable Markov Decision Problems (OPOMDPs), a special type of POMDP that rather than standard observations includes an “oracle ” that can be consulted for full state informat ..."
Abstract
 Add to MetaCart
Abstract — We propose a new approximate algorithm, LAJIV (Lookahead JMDP Information Value), to solve Oracular Partially Observable Markov Decision Problems (OPOMDPs), a special type of POMDP that rather than standard observations includes an “oracle ” that can be consulted for full state information at a fixed cost. We previously introduced JIV (JMDP Information Value) to solve OPOMDPs, an heuristic algorithm that utilizes the solution of the underlying MDP and weighs the value of consulting the oracle against the value of taking a statemodifying action. While efficient, JIV will rarely find the optimal solution. In this paper, we extend JIV to include lookahead, thereby permitting arbitrarily small deviation from the optimal policy’s longterm expected reward at the cost of added computation time. The depth of the lookahead is a parameter that governs this tradeoff; by iteratively increasing this depth, we provide an anytime algorithm that yields an everimproving solution. LAJIV leverages the OPOMDP framework’s unique characteristics to outperform generalpurpose approximate POMDP solvers; in fact, we prove that LAJIV is a polytime approximation scheme (PTAS) with respect to the size of the state and observation spaces, thereby showing rigorously that OPOMDPs are “easier ” than POMDPs. Finally, we substantiate our theoretical results via an empirical analysis of a benchmark OPOMDP instance. I.