Results 1 - 10
of
88
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 515 (4 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes
, 2003
"... To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions.
An Analysis of Model-Based Interval Estimation for Markov Decision Processes
, 2007
"... Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper pres ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
(Show Context)
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
"... Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data or learned from experience, it is not surprising that the actual performance of a chosen strategy often significantl ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
(Show Context)
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data or learned from experience, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer’s initial expectations due to unavoidable modeling ambiguity. In this paper, we present a set of percentile criteria that are conceptually natural and representative of the tradeoff between optimistic and pessimistic point of views on the question. We study the use of these criteria under different forms of uncertainty for both the rewards and the transitions. Some forms will be shown to be efficiently solvable and others highly intractable. In each case, we will outline solution concepts that take parametric uncertainty into account in the process of decision making.
Resource-aware wireless sensor-actuator networks
- IEEE Data Engineering
, 2005
"... Innovations in wireless sensor networks (WSNs) have dramatically expanded the applicability of control technology in day-to-day life, by enabling the cost-effective deployment of large scale sensor-actuator systems. In this paper, we discuss the issues and challenges involved in deploying control-or ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
(Show Context)
Innovations in wireless sensor networks (WSNs) have dramatically expanded the applicability of control technology in day-to-day life, by enabling the cost-effective deployment of large scale sensor-actuator systems. In this paper, we discuss the issues and challenges involved in deploying control-oriented applications over unreliable, resource-constrained WSNs, and describe the design of our planned Sensor Control System (SCS) that can enable the rapid development and deployment of such applications. 1
epsilon-MDPs: Learning in Varying Environments
, 2002
"... In this paper #-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this paper #-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments.
Relativized Options: Choosing the Right Transformation
- Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option's policy is transformed ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
(Show Context)
Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option's policy is transformed suitably based on the circumstances under which the option is invoked. In earlier work we addressed the issue of learning the option policy online. In this article we develop an algorithm for choosing, from among a set of candidate transformations, the right transformation for each member of the family of tasks.
SAVES: A sustainable multiagent application to conserve building energy considering occupants
- In AAMAS
, 2012
"... This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facil ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
(Show Context)
This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facility management team, SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held devices, etc.; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it focuses on a non-residential building, where human occupants do not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and (iv) SAVES uses a novel algorithm for generating optimal MDP policies that explicitly consider multiple criteria optimization (energy
Troffaes. Dynamic programming for deterministic discrete-time systems with uncertain gain
- International Journal of Approximate Reasoning
, 2004
"... We generalise the optimisation technique of dynamic programming for discretetime systems with an uncertain gain function. We assume that uncertainty about the gain function is described by an imprecise probability model, which generalises the well-known Bayesian, or precise, models. We compare vario ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
We generalise the optimisation technique of dynamic programming for discretetime systems with an uncertain gain function. We assume that uncertainty about the gain function is described by an imprecise probability model, which generalises the well-known Bayesian, or precise, models. We compare various optimality criteria that can be associated with such a model, and which coincide in the precise case: maximality, robust optimality and maximinity. We show that (only) for the first two an optimal feedback can be constructed by solving a Bellman-like equation. Key words: optimal control, dynamic programming, uncertainty, imprecise probabilities, lower previsions, sets of probabilities 1 Introduction to the Problem The main objective in optimal control is to find out how a system can be influenced, or controlled, in such a way that its behaviour satisfies certain requirements, while at the same time maximising a given gain function. A very efficient method for solving optimal control problems for discrete-time systems is the recursive dynamic