Results 1 - 10
of
273
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
- WORKING NOTES OF THE 1991 AAAI SPRING SYMPOSIUM
, 1991
"... Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes ..."
Abstract
-
Cited by 427 (13 self)
- Add to MetaCart
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples -- these are among the basic building blocks making up the architecture -- yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Prioritized sweeping: Reinforcement learning with less data and less time
- Machine Learning
, 1993
"... We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of ..."
Abstract
-
Cited by 275 (5 self)
- Add to MetaCart
We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of di erent stochastic optimal control prob-lems. It successfully solves large state-space real time problems with which other methods have di culty. 1 1
Geographical and energy aware routing: A recursive data dissemination protocol for wireless sensor networks
, 2001
"... Future sensor networks will be composed of a large number of densely deployed sensors/actuators. A key feature of such networks is that their nodes are untethered and unattended. Consequently, energy efficiency is an important design consideration for these networks. Motivated by the fact that senso ..."
Abstract
-
Cited by 182 (4 self)
- Add to MetaCart
Future sensor networks will be composed of a large number of densely deployed sensors/actuators. A key feature of such networks is that their nodes are untethered and unattended. Consequently, energy efficiency is an important design consideration for these networks. Motivated by the fact that sensor network queries may often be geographical, we design and evaluate an energy efficient routing algorithm that propagates a query to the appropriate geographical region, without flooding. The proposed Geographic and Energy Aware Routing (GEAR) algorithm uses energy aware neighbor selection to route a packet towards the target region and Recursive Geographic Forwarding or Restricted Flooding algorithm to disseminate the packet inside the destination region. We evaluate the GEAR protocol using simulation. We find that, especially for non-uniform traffic distribution, GEAR exhibits noticeably longer network lifetime than non-energyaware geographic routing algorithms. 1
Planning with Incomplete Information as Heuristic Search in Belief Space
, 2000
"... The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information ..."
Abstract
-
Cited by 174 (23 self)
- Add to MetaCart
The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information can be formulated as a problem of search in belief space, where belief states can be either sets of states or more generally probability distribution over states. While the formulation (as the formulation of classical planning as heuristic search) is not particularly novel, the contribution of this paper is to make it explicit, to test it over a number of domains, and to extend it to tasks like planning with sensing where the standard search algorithms do not apply. The resulting planner appears to be competitive with the most recent conformant and contingent planners (e.g., cgp, sgp, and cmbp) while at the same time is more general as it can handle probabilistic actions and se...
Decision-Theoretic Deliberation Scheduling for Problem Solving In . . .
- ARTIFICIAL INTELLIGENCE
, 1994
"... We are interested in the problem faced byanagent with limited computational capabilities, embedded in a complex environment with other agents and processes not under its control. Careful management of computational resources is important for complex problem-solving tasks in which the time spent in ..."
Abstract
-
Cited by 152 (3 self)
- Add to MetaCart
We are interested in the problem faced byanagent with limited computational capabilities, embedded in a complex environment with other agents and processes not under its control. Careful management of computational resources is important for complex problem-solving tasks in which the time spent in decision making affects the quality of the responses generated by a system.
Planning as Heuristic Search: New Results
- IN PROCEEDINGS OF ECP-99
, 1999
"... In the recent AIPS98 Planning Competition, the hsp planner, based on a forward state search and a domain-independent heuristic, showed that heuristic search planners can be competitive with state of the art Graphplan and Satisfiability planners. hsp solved more problems than the other planners b ..."
Abstract
-
Cited by 148 (14 self)
- Add to MetaCart
In the recent AIPS98 Planning Competition, the hsp planner, based on a forward state search and a domain-independent heuristic, showed that heuristic search planners can be competitive with state of the art Graphplan and Satisfiability planners. hsp solved more problems than the other planners but it often took more time or produced longer plans. The main bottleneck in hsp is the computation of the heuristic for every new state. This computation may take up to 85% of the processing time. In this paper, we present a solution to this problem that uses a simple change in the direction of the search. The new planner, that we call hspr, is based on the same ideas and heuristic as hsp, but searches backward from the goal rather than forward from the initial state. This allows hspr to compute the heuristic estimates only once. As a result, hspr can produce better plans, often in less time. For example, hspr solves each of the 30 logistics problems from Kautz and Selman in less than 3 seconds. This is two orders of magnitude faster than blackbox. At the same time
Distributed Rational Decision Making
, 1999
"... Introduction Automated negotiation systems with self-interested agents are becoming increasingly important. One reason for this is the technology push of a growing standardized communication infrastructure---Internet, WWW, NII, EDI, KQML, FIPA, Concordia, Voyager, Odyssey, Telescript, Java, etc---o ..."
Abstract
-
Cited by 148 (0 self)
- Add to MetaCart
Introduction Automated negotiation systems with self-interested agents are becoming increasingly important. One reason for this is the technology push of a growing standardized communication infrastructure---Internet, WWW, NII, EDI, KQML, FIPA, Concordia, Voyager, Odyssey, Telescript, Java, etc---over which separately designed agents belonging to different organizations can interact in an open environment in realtime and safely carry out transactions. The second reason is strong application pull for computer support for negotiation at the operative decision making level. For example, we are witnessing the advent of small transaction electronic commerce on the Internet for purchasing goods, information, and communication bandwidth [29]. There is also an industrial trend toward virtual enterprises: dynamic alliances of small, agile enterprises which together can take advantage of economies of scale when available (e.g., respond to mor
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Machine Learning
, 1999
"... An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even in-nite state spaces, traditional planning and reinforcement learning algorith ..."
Abstract
-
Cited by 137 (5 self)
- Add to MetaCart
An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even in-nite state spaces, traditional planning and reinforcement learning algorithms are often inapplicable, since their running time typically scales linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (simulator) for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor and the desired degree of approximation to the optimal policy), our results establish for the rst time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. 1

