Results 1 - 10
of
13
Evolutionary function approximation for reinforcement learning
- Journal of Machine Learning Research
, 2006
"... Ø�ÓÒ�ÔÔÖÓÜ�Ñ�Ø�ÓÒ�ÒÓÚ�Ð�ÔÔÖÓ��ØÓ�ÙØÓÑ�Ø��ÐÐÝ× � Ø�ÓÒ�Ð���×�ÓÒ×Ì��ר��×�×�ÒÚ�ר���Ø�×�ÚÓÐÙØ�ÓÒ�ÖÝ�ÙÒ �Ò�ÓÖ�Ñ�ÒØÐ��ÖÒ�Ò�ÔÖÓ�Ð�Ñ×�Ö�Ø��×Ù�×�ØÓ�Ø��×�Ø�×� × ÁÒÑ�ÒÝÑ���Ò�Ð��ÖÒ�Ò�ÔÖÓ�Ð�Ñ×�Ò���ÒØÑÙרÐ��ÖÒ Ñ�ÒØ���Òר�ÒØ��Ø�ÓÒÓ��ÚÓÐÙØ�ÓÒ�ÖÝ�ÙÒØ�ÓÒ�ÔÔÖÓÜ�Ñ � Ù�Ðר��Ø�Ö���ØØ�Ö��Ð�ØÓÐ��ÖÒÁÔÖ�×�ÒØ��ÙÐÐÝ�ÑÔÐ � Ø�Ó ..."
Abstract
-
Cited by 60 (15 self)
- Add to MetaCart
Ø�ÓÒ�ÔÔÖÓÜ�Ñ�Ø�ÓÒ�ÒÓÚ�Ð�ÔÔÖÓ��ØÓ�ÙØÓÑ�Ø��ÐÐÝ× � Ø�ÓÒ�Ð���×�ÓÒ×Ì��ר��×�×�ÒÚ�ר���Ø�×�ÚÓÐÙØ�ÓÒ�ÖÝ�ÙÒ �Ò�ÓÖ�Ñ�ÒØÐ��ÖÒ�Ò�ÔÖÓ�Ð�Ñ×�Ö�Ø��×Ù�×�ØÓ�Ø��×�Ø�×� × ÁÒÑ�ÒÝÑ���Ò�Ð��ÖÒ�Ò�ÔÖÓ�Ð�Ñ×�Ò���ÒØÑÙרÐ��ÖÒ Ñ�ÒØ���Òר�ÒØ��Ø�ÓÒÓ��ÚÓÐÙØ�ÓÒ�ÖÝ�ÙÒØ�ÓÒ�ÔÔÖÓÜ�Ñ � Ù�Ðר��Ø�Ö���ØØ�Ö��Ð�ØÓÐ��ÖÒÁÔÖ�×�ÒØ��ÙÐÐÝ�ÑÔÐ � Ø�ÓÒÛ���ÓÑ��Ò�ׯ��Ì�Ò�ÙÖÓ�ÚÓÐÙØ�ÓÒ�ÖÝÓÔØ�Ñ�Þ � Ð�Ø�Ò��ÙÒØ�ÓÒ�ÔÔÖÓÜ�Ñ�ØÓÖÖ�ÔÖ�×�ÒØ�Ø�ÓÒר��Ø�Ò��Ð� Ø�ÓÒØ��Ò�ÕÙ�Û�Ø�ÉÐ��ÖÒ�Ò��ÔÓÔÙÐ�ÖÌ�Ñ�Ø�Ó�Ì� � �Æ��ÒØ�Ò��Ú��Ù�ÐÐ��ÖÒ�Ò�Ì��×Ñ�Ø�Ó��ÚÓÐÚ�×�Ò��Ú� � ÓÔØ�Ñ�Þ�Ø�ÓÒ��ÐÐ�ÒØ��×�Ø��ÓÖÝ��Ú�ÐÓÔ�Ò��«�Ø�Ú�Ö��Ò �ÓÖÁÒר����ØÖ���Ú�×ÓÒÐÝÔÓ×�Ø�Ú��Ò�Ò���Ø�Ú�Ö�Û�Ö� × ÔÖÓ�Ð�Ñ××Ù��×ÖÓ�ÓØÓÒØÖÓÐ��Ñ�ÔÐ�Ý�Ò��Ò�×Ýר�Ñ �ÒÛ���Ø�����ÒØÒ�Ú�Ö×��×�Ü�ÑÔÐ�×Ó�ÓÖÖ�Ø����Ú 1.
A.: Reinforcement learning in continuous action spaces through sequential Monte Carlo methods
- In: Adv. Neural Information Proc. Systems
, 2007
"... Learning in real-world domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithms to continuous state problems, the same techniques can be hardly extended to continuous action spaces, where, besides th ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Learning in real-world domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithms to continuous state problems, the same techniques can be hardly extended to continuous action spaces, where, besides the computation of a good approximation of the value function, a fast method for the identification of the highest-valued action is needed. In this paper, we propose a novel actor-critic approach in which the policy of the actor is estimated through sequential Monte Carlo methods. The importance sampling step is performed on the basis of the values learned by the critic, while the resampling step modifies the actor’s policy. The proposed approach has been empirically compared to other learning algorithms into several domains; in this paper, we report results obtained in a control problem consisting of steering a boat across a river. 1
Adaptive Tile Coding for Value Function Approximation
"... Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depe ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents adaptive tile coding, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.
Continuous-State Reinforcement Learning with Fuzzy Approximation
"... Abstract. Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form, these algorithms require that the environment states and agent actions take values in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more difficult case where the state-action space is continuous. In this work, we propose a fuzzy approximation architecture similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We prove that the resulting algorithm converges. We also give a modified, asynchronous variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided. 1
Adaptive Tile Coding for Reinforcement Learning
"... • Most real-world reinforcement learning tasks have large or continuous state spaces • Table-based approaches are infeasible, need function approximators (FAs) instead • Many types of function approximation exist, e.g. neural networks, radial basis functions, tile coding • Typically require manually ..."
Abstract
- Add to MetaCart
• Most real-world reinforcement learning tasks have large or continuous state spaces • Table-based approaches are infeasible, need function approximators (FAs) instead • Many types of function approximation exist, e.g. neural networks, radial basis functions, tile coding • Typically require manually designed representation • Question: Can agents learn their own representations? • Conclusion: Yes!
A New Approach for Value Function Approximation Based on Automatic State Partition
"... Abstract—Value function is usually used to deal with the reinforcement learning problems. In large or even continuous states, function approximation must be used to represent value function. Much of the current work carried out, however, has to design the structure of function approximation in advan ..."
Abstract
- Add to MetaCart
Abstract—Value function is usually used to deal with the reinforcement learning problems. In large or even continuous states, function approximation must be used to represent value function. Much of the current work carried out, however, has to design the structure of function approximation in advanced which cannot be adjusted during learning. In this paper, we propose a novel function approximation called Fuzzy CMAC (FCMAC) with automatic state partition (ASP-FCMAC) to automate the structure design for FCMAC. Based on CMAC (also known as tile coding), ASP-FCMAC employs fuzzy membership function to avoid the setting of parameter in CMAC, and makes use of Bellman error to partition the state automatically so as to generate the structure of FC-MAC. Empirical results in both mountain car and RoboCup Keepaway domains demonstrate that ASP-FCMAC can automatically generate the structure of FCMAC and agent using it can learn efficiently.
Adaptive Tile-Coding for Reinforcement Learning
"... Abstract: Difficult problems in reinforcement learning typically require function approximators to effectively estimate value functions. Many different kinds of function approximators are currently in use, including neural networks, radial basis functions, and instance-based methods. Among the most ..."
Abstract
- Add to MetaCart
Abstract: Difficult problems in reinforcement learning typically require function approximators to effectively estimate value functions. Many different kinds of function approximators are currently in use, including neural networks, radial basis functions, and instance-based methods. Among the most successful are tile-codings (or CMACs), which consist of piecewise-constant approximations formed by discretizing the state space into disjoint tiles and aggregating values from multiple, slightly offset tilings. However, to make tile-coding work well in practice, a human expert must manually design the tile-coding representation, i.e. the size and shape of each tile. We present a new method, called adaptive tile coding, which automates this process. Borrowing the idea of "complexification" from methods that learn representations for neural network function approximators [1], this approach begins with simple representations with few tiles and adds new complexity during learning by splitting existing tiles into smaller ones. Local estimates of Bellman error are used to determine which tiles should be split. This approach, in addition to automatically discovering effective
Autonomous Qualitative Learning of Distinctions and Actions in a Developing Agent
"... How can an agent bootstrap up from a pixel-level representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of con ..."
Abstract
- Add to MetaCart
How can an agent bootstrap up from a pixel-level representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.
1.1 Mini Golf Game
"... In the mini golf game the agent has to shoot a ball inside a hole with the minimum number of strokes. Given the distance x0 of the ball from the hole, the agent must determine the initial velocity v0 to put the ball in the hole in one strike. For each distance x0, the ball falls in the hole if its i ..."
Abstract
- Add to MetaCart
In the mini golf game the agent has to shoot a ball inside a hole with the minimum number of strokes. Given the distance x0 of the ball from the hole, the agent must determine the initial velocity v0 to put the ball in the hole in one strike. For each distance x0, the ball falls in the hole if its initial velocity v0 ranges from vmin 0 = √ 2gkx0 to vmax 0 = √ 2gkx0 + v2 max, where g is the universal constant of gravity, k = 0.0305 is the coefficient of friction between the ball and the ground, and vmax is the maximum velocity allowed at the border of the hole in order to make the ball to enter the hole and not to overcome it. Assuming that the ball has a diameter of 4.5cm and that the hole has a diameter of 7.5cm, vmax is equal to 110.7cm/s. At the beginning of each trial the ball is placed at random, between 2000cm and 0cm far from the hole. The initial velocity is limited within the interval [0; 500]cm/s. When the ball enters the hole the episode ends with reward 0. If v0> vmax 0, the ball is lost and the episode ends with reward −10. Finally, if v0 < vmin 0 the episode goes on and the agent can try another hit with reward −1. The state variable x is discretized into ten, 200cm wide intervals. Such discretization has been chosen so that, for each interval, there is only one value of velocity that makes the ball to enter the hole, independently from the actual position in the interval. This experiment aims at investigating the capability of SMC-learning to adapt the distribution of

