Results 1 - 10
of
19
Learning symbolic models of stochastic domains
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2005
"... In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a a new probabilistic planning rule representation to compactly model model noisy, nondeterministic action effects and show how these rules can be effectively learned. Through experimen ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a a new probabilistic planning rule representation to compactly model model noisy, nondeterministic action effects and show how these rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics.
Relational reinforcement learning: An overview
- In Proceedings of the ICML’04 Workshop on Relational Reinforcement Learning
, 2004
"... Relational reinforcement learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1. ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Relational reinforcement learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1.
Learning partially observable deterministic action models
- In Proc. Nineteenth International Joint Conference on Artificial Intelligence (IJCAI ’05
, 2005
"... We present exact algorithms for identifying deterministic-actions ’ effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model (the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenari ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
We present exact algorithms for identifying deterministic-actions ’ effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model (the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis. 1.
Learning planning rules in noisy stochastic worlds
- IN AAAI
, 2005
"... We present an algorithm for learning a model of the effects of actions in noisy stochastic worlds. We consider learning in a 3D simulated blocks world with realistic physics. To model this world, we develop a planning representation with explicit mechanisms for expressing object reference and noise. ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We present an algorithm for learning a model of the effects of actions in noisy stochastic worlds. We consider learning in a 3D simulated blocks world with realistic physics. To model this world, we develop a planning representation with explicit mechanisms for expressing object reference and noise. We then present a learning algorithm that can create rules while also learning derived predicates, and evaluate this algorithm in the blocks world simulator, demonstrating that we can learn rules that effectively model the world dynamics.
Search control in planning for temporally extended goals
- In Proc. ICAPS-05
, 2005
"... Current techniques for reasoning about search control knowledge in AI planning, such as those used in TLPlan, TALPlanner, or SHOP2, assume that search control knowledge is conditioned upon and interpreted with respect to a fixed set of goal states. Therefore, these techniques can deal with reachabil ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Current techniques for reasoning about search control knowledge in AI planning, such as those used in TLPlan, TALPlanner, or SHOP2, assume that search control knowledge is conditioned upon and interpreted with respect to a fixed set of goal states. Therefore, these techniques can deal with reachability goals but do not apply to temporally extended goals, such as goals of achieving a condition whenever a certain fact becomes true. Temporally extended goals convey several intermediate reachability goals to be achieved at different point of execution, sometimes with cyclic executions; that is, the notion of goal state becomes dynamic. In this paper, we describe a method for reasoning about search control knowledge in the presence of temporally extended goals. Given such a goal, we generate an equivalent Büchi automaton— an automaton recognising the language of the executions satisfying the goal—and interpret control knowledge over this automaton and the world state trajectories generated by a forward search planner. This method is implemented and experimented with as an extension of the TLPlan planner, which incidentally becomes capable of handling cyclic goals.
Learning models of relational stochastic processes
- In Proceedings of the Sixteenth European Conference on Machine Learning
, 2005
"... Abstract. Processes involving change over time, uncertainty, and rich relational structure are common in the real world, but no general algorithms exist for learning models of them. In this paper we show how Markov logic networks (MLNs), a recently developed approach to combining logic and probabili ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Processes involving change over time, uncertainty, and rich relational structure are common in the real world, but no general algorithms exist for learning models of them. In this paper we show how Markov logic networks (MLNs), a recently developed approach to combining logic and probability, can be applied to time-changing domains. We then show how existing algorithms for parameter and structure learning in MLNs can be extended to this setting. We apply this approach in two domains: modeling the spread of research topics in scientific communities, and modeling faults in factory assembly processes. Our experiments show that it greatly outperforms purely logical (ILP) and purely probabilistic (DBN) learners. 1
DeJong: Explanation-Based Acquisition of Planning Operators
- ICAPS
, 2006
"... Classical planning algorithms require that their operators be simple in order for planning to be tractable. However, the complexities of real world domains suggest that, in order to be accurate, planning operators must be complex. We demonstrate how, by taking advantage of background knowledge and t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Classical planning algorithms require that their operators be simple in order for planning to be tractable. However, the complexities of real world domains suggest that, in order to be accurate, planning operators must be complex. We demonstrate how, by taking advantage of background knowledge and the distribution of planning problems encountered, it is possible to automatically construct planning operators that are both reliable and succinct. The acquired operator is an encapsulated control loop that is specialized to best fit observed world behavior. Succinctness is achieved by publishing to the planner only those conditions required to succeed over the estimated distribution of problems. We demonstrate the acquisition of a context-appropriate “take-off ” operator that can successfully control a complex flight simulator.
Learning action durations from executions
"... Accurate action models are essential for efficiently solving automated planning tasks. An accurate action model allow the planner to precisely foresee the consequences of executing actions in a given environment and therefore to find robust and good quality plans. But when addressing planning tasks ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Accurate action models are essential for efficiently solving automated planning tasks. An accurate action model allow the planner to precisely foresee the consequences of executing actions in a given environment and therefore to find robust and good quality plans. But when addressing planning tasks in the real world, even hand-coding a simple STRIPS action model is complex, thus defining action models capturing further features, like the execution duration or costs, becomes more difficult. Moreover, if these features can be captured at a given instant they may vary over time. In this paper we automatically model the duration of action execution as relational regression trees learned from observing plan executions. And we show how planners find better plans after incorporating these models to their domain definition.
Learning Models of Relational MDPs using Graph Kernels
"... Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Model-based methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Model-based methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially unknown, one might learn the model first and then apply the model-based method (indirect reinforcement learning). In this paper, we propose a method for model-learning that is based on a combination of several SVMs using graph kernels. Indeterministic processes can be dealt with by combining the kernel approach with a clustering technique. We demonstrate the validity of the approach by a range of experiments on various Blocksworld scenarios. 1
Learning Non-Deterministic Multi-Agent Planning Domains ∗
"... In this paper, we present an algorithm for learning nondeterministic multi-agent planning domains from execution examples. The algorithm uses a master-slave decomposition of two population-based stochastic local search algorithms and integrates binary decision diagrams to reduce the size of the sear ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we present an algorithm for learning nondeterministic multi-agent planning domains from execution examples. The algorithm uses a master-slave decomposition of two population-based stochastic local search algorithms and integrates binary decision diagrams to reduce the size of the search space. Our experimental results show that the learner has high convergence rates due to an aggressive exploitation of example-driven search and an efficient separation of concurrent activities. Moreover, even though the learning problem is at least as hard as learning disjoint DNF formulas, large domains can be learned accurately within a few minutes.

