Results 1 - 10
of
12
Bellman goes Relational
- In ICML
, 2004
"... Motivated by the interest in relational reinforcement learning, we introduce a novel relational Bellman update operator called ReBel. It employs a constraint logic programming language to compactly represent Markov decision processes over relational domains. ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Motivated by the interest in relational reinforcement learning, we introduce a novel relational Bellman update operator called ReBel. It employs a constraint logic programming language to compactly represent Markov decision processes over relational domains.
On Using Guidance in Relational Reinforcement Learning
- MACHINE LEARNING
, 2004
"... Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table and because the Q-function only converges afte ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table and because the Q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (RRL) is such an approach; it makes Q-learning feasible in structural domains by incorporating a relational learner into Q-learning. To solve the second problem a use of "reasonable policies" to provide guidance has been suggested. In this paper we investigate the best ways to provide guidance in two different domains.
Logical Markov Decision Programs and the Convergence of Logical TD(λ)
- Proc. of ILP’04
, 2004
"... Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel re ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Decision Processes (MDPs) with Logic Programs. Using LOMDPs one can compactly and declaratively represent complex MDPs. Within this framework we then devise a relational upgrade of TD(#) called logical TD(#) and prove convergence.
Decision tree methods for finding reusable MDP homomorphisms
- In AAAI. 61
, 2006
"... State abstraction is a useful tool for agents interacting with complex environments. Good state abstractions are compact, reuseable, and easy to learn from sample data. This paper combines and extends two existing classes of state abstraction methods to achieve these criteria. The first class of met ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
State abstraction is a useful tool for agents interacting with complex environments. Good state abstractions are compact, reuseable, and easy to learn from sample data. This paper combines and extends two existing classes of state abstraction methods to achieve these criteria. The first class of methods search for MDP homomorphisms (Ravindran 2004), which produce models of reward and transition probabilities in an abstract state space. The second class of methods, like the UTree algorithm (McCallum 1995), learn compact models of the value function quickly from sample data. Models based on MDP homomorphisms can easily be extended such that they are usable across tasks with similar reward functions. However, value based methods like UTree cannot be extended in this fashion. We present results showing a new, combined algorithm that fulfills all three criteria: the resulting models are compact, can be learned quickly from sample data, and can be used across a class of reward functions.
Reinforcement Learning for Relational MDPs
- In Proceedings of the Machine Learning Conference of Belgium and the Netherlands
, 2004
"... In this paper we present a new method for reinforcement learning in relational domains. A logical language is employed to abstract over states and actions, thereby decreasing the size of the state-action space signi cantly. A probabilistic transition model of the abstracted MarkovDecision -Process ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we present a new method for reinforcement learning in relational domains. A logical language is employed to abstract over states and actions, thereby decreasing the size of the state-action space signi cantly. A probabilistic transition model of the abstracted MarkovDecision -Process is estimated to speed-up learning. We present theoretical and experimental analysis of our new representation. Some insights concerning the problems and opportunities of logical representations for reinforcement learning are obtained in the context of a growing interest in the use of abstraction in reinforcement learning contexts.
Reinforcement learning with markov logic networks
- In Proceedigns of European Workshop on Reinforcement Learning
, 2008
"... Abstract. In this paper, we propose a method to combine reinforcement learning (RL) and Markov logic networks (MLN). RL usually does not consider the inherent relations or logical connections of the features. Markov logic networks combines first-order logic and graphical model and it can represent a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a method to combine reinforcement learning (RL) and Markov logic networks (MLN). RL usually does not consider the inherent relations or logical connections of the features. Markov logic networks combines first-order logic and graphical model and it can represent a wide variety of knowledge compactly and abstractly. We propose a new method, reinforcement learning algorithm with Markov logic networks (RLMLN), to deal with many difficult problems in RL which have much prior knowledge to employ and need some relational representation of states. With RLMLN, prior knowledge can be easily introduced to the learning systems and the learning process will become more efficient. Experiments on blocks world illustrate that RLMLN is a promising method. 1
Defining Object Types and Options Using MDP Homomorphisms
"... Agents in complex environments can have a wide range of tasks to perform over time. However, often there are sets of tasks that involve similar goals on similar objects, e.g., the skill of making a car move to a destination is similar for all cars. This paper lays out a framework for specifying goal ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Agents in complex environments can have a wide range of tasks to perform over time. However, often there are sets of tasks that involve similar goals on similar objects, e.g., the skill of making a car move to a destination is similar for all cars. This paper lays out a framework for specifying goals that are parameterized with focus objects, as well as defining object type in such a way that objects of the same type share policies. The method is agnostic as to the underlying state representation, as long as simple functions of the state of the object can be calculated.
Learning for applying Reinforcement Learning
"... algorithms on the relational domains with the states and actions in relational form. In the model, the logical negation is represented explicitly, so that the abstract state space can be constructed from the goal state(s) of a given task simply by applying a generating method and an expanding method ..."
Abstract
- Add to MetaCart
(Show Context)
algorithms on the relational domains with the states and actions in relational form. In the model, the logical negation is represented explicitly, so that the abstract state space can be constructed from the goal state(s) of a given task simply by applying a generating method and an expanding method, and each ground state can be represented by one and only one abstract state. Prototype action is also introduced into the model, so that the applicable abstract actions can be obtained automatically. Based on the model, a model-free Θ(λ)-learning algorithm is implemented to evaluate the state-action-substitution value function. We also propose a state refinement method guided by two formal definitions states to construct the abstract state space automatically by the agent itself rather than manually. The experiments show that the agent can catch the core of the given task, and the final state space is intuitive.
Decision Tree Methods for Finding Reusable MDP Homomorphisms
"... State abstraction is a useful tool for agents interacting with complex environments. Good state abstractions are compact, reuseable, and easy to learn from sample data. This paper combines and extends two existing classes of state abstraction methods to achieve these criteria. The first class of met ..."
Abstract
- Add to MetaCart
State abstraction is a useful tool for agents interacting with complex environments. Good state abstractions are compact, reuseable, and easy to learn from sample data. This paper combines and extends two existing classes of state abstraction methods to achieve these criteria. The first class of methods search for MDP homomorphisms (Ravindran 2004), which produce models of reward and transition probabilities in an abstract state space. The second class of methods, like the UTree algorithm (McCallum 1995), learn compact models of the value function quickly from sample data. Models based on MDP homomorphisms can easily be extended such that they are usable across tasks with similar reward functions. However, value based methods like UTree cannot be extended in this fashion. We present results showing a new, combined algorithm that fulfills all three criteria: the resulting models are compact, can be learned quickly from sample data, and can be used across a class of reward functions.