## Learning a partial behavior for a competitive robotic soccer agent (2006)

### Cached

### Download Links

- [www.ni.uos.de]
- [ml.informatik.uni-freiburg.de]
- [www2.informatik.uni-freiburg.de]
- DBLP

### Other Repositories/Bibliography

Venue: | KI Zeitschrift |

Citations: | 7 - 5 self |

### BibTeX

@ARTICLE{Gabel06learninga,

author = {Thomas Gabel and Martin Riedmiller},

title = {Learning a partial behavior for a competitive robotic soccer agent},

journal = {KI Zeitschrift},

year = {2006},

volume = {20},

pages = {18--23}

}

### OpenURL

### Abstract

Robotic soccer is a highly competitive domain. Accordingly, the use of learnt behaviors in this application field presumes not only learning algorithms that are known to converge and produce stable results, but also imposes the wish for obtaining optimal or at least near-optimal behaviors, even when working within high-dimensional and continuous state/action spaces. This paper deals with the continuous amelioration of adaptive soccer playing skills in robotic soccer simulation, documenting and presenting results of our hunt for optimal policies. We show that not too much effort is necessary to realize straightforward Reinforcement Learning algorithms in this domain, but that a heavy load of work is required when tweaking them towards competitiveness. 1

### Citations

1328 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...ne, i.e. the processes of collecting (simulated) experience 2 and learning the value function run in parallel. In this work we update the value function’s estimates according to the TD(1) update rule =-=[14]-=-, where the new estimate for V (sk) is calculated as V (sk) := (1 − α) · V (sk) + α · ret(sk) with ret(sk) = P N j=k r(sk, π(sk)) indicating the summed rewards following state sk and α as a decaying l... |

730 | A direct adaptive method for faster back-propagation learning: The RPROP algorithm
- Riedmiller, Braun
- 1993
(Show Context)
Citation Context ...en, the actual training means determining w by solving the least squares optimization problem minw P s∈ ˜ S (˜ V (s, w) − V (s)) 2 . For the minimization we rely on the back-propagation variant RPROP =-=[11]-=-. Making use of a neural function approximator with 24 sigmoidal neurons in its single hidden layer (best among several net architectures tested) we got ahead a significant step: After 500k training e... |

190 | Policy invariance under reward transformations: Theory and application to reward shaping
- Ng, Harada, et al.
- 1999
(Show Context)
Citation Context ...pt the ball successfully, but often does not find the most “aggressive” way to the ball and executes too many turns. To combat the problem of “avoidable turns” we now pursue a reward shaping approach =-=[9]-=-: Turn actions shall incur higher immediate costs than dashes. In the soccer simulation context this is not intuitive since here turns are free of charge whereas dash actions reduce the player’s stam... |

147 | Tree-based batch mode reinforcement learning
- Ernst, Geurts, et al.
- 2005
(Show Context)
Citation Context ... each s ∈ ˜ S we have an estimated value V (s) as calculated by Algorithm 2. So, our learning algorithm is in the spirit of fitted value iteration [4] and other current off-policy RL approaches (e.g. =-=[2]-=-). Let the state value function approximation provided by the net be denoted as ˜V (s, w) where w corresponds to a vector of tunable parameters, i.e. the net’s connection weights. Then, the actual tra... |

143 | Soccer server: A tool for research on multiagent systems
- Noda, Matsubara, et al.
- 1998
(Show Context)
Citation Context ...r-playing robots to simulated ones. The context for this paper is RoboCup’s 2D Simulation League, where two teams of simulated soccerplaying agents compete against one another using the Soccer Server =-=[10]-=-, a real-time soccer simulation system. With our competition team, Brainstormers, we have been participating in the RoboCup [15] championship tournaments for several years, whereupon our main research... |

90 | The CMUnited-99 champion simulator team
- Stone, Riley, et al.
- 2000
(Show Context)
Citation Context ...en by the agent. Figure 1: Difficulties in Ball Interception 3 Ball Interception Methods In its first version our team made use of the ball interception routine from Carnegie Mellon’s team CMUnited98 =-=[13]-=-. Intending to apply Reinforcement Learning in a competitive domain like robotic soccer, we soon realized a straightforward RL approach based on value iteration and state value function approximation ... |

69 | Approximate Solutions to Markov Decision Processes
- Gordon
- 1999
(Show Context)
Citation Context ...f representative states ˜ S ⊂ S is built up where for each s ∈ ˜ S we have an estimated value V (s) as calculated by Algorithm 2. So, our learning algorithm is in the spirit of fitted value iteration =-=[4]-=- and other current off-policy RL approaches (e.g. [2]). Let the state value function approximation provided by the net be denoted as ˜V (s, w) where w corresponds to a vector of tunable parameters, i.... |

48 | Qualitative velocity and ball interception, in
- Stolzenburg, Obst, et al.
- 2002
(Show Context)
Citation Context ...are required. This also refers to the task of ball interception: Other authors have presented very efficient numerical algorithms for computing the time t it takes a player to intercept a moving ball =-=[12]-=-. They show that there is no closed form for calculating t and therefore make use of Newton’s method to numerically find a near-optimal t. However, their algorithm makes two simplifying assumptions ab... |

38 |
Multilayer Feedforward Networks are Universal Approximators
- Hornick
- 1989
(Show Context)
Citation Context ...a more powerful function approximation mechanism. Feedforward neural networks are known to be capable of approximating arbitrarily closely any function f : S → R that is continuous on a bounded set S =-=[6]-=-. As we will argue in the following, the usage of multi-layer perceptrons brings us nearer to the target on our way to an optimal policy and allows us to gain insights into the problem structure. We p... |

30 | Reinforcement learning for 3 vs. 2 keepaway
- Stone, Sutton, et al.
- 2000
(Show Context)
Citation Context ... robotic soccer competitions 1 , we aim at learning competitive behavior policies. Several research groups have dealt with the task of learning parts of a soccer-playing agent’s behavior autonomously =-=[7, 8]-=-. The focus of this paper is on learning a soccer player’s basic behaviors, its so-called skills. One of the most important fundamental capabilities of a soccer player is to intercept a running ball a... |

18 | Cbr for state value function approximation in reinforcement learning
- Gabel, Riedmiller
- 2005
(Show Context)
Citation Context ...se management and thus for disassociating from some of its experience from time to time, and which predicts other state values using k-nearest neighbor regression, we could achieve clear improvements =-=[3]-=-. With much less memory consumption (|CB| = 2000) we succeeded in making it half the way to the optimum. Certainly, the numbers summarized in Table 1 (lines 4-5) could even have been improved, if we h... |

3 |
RoboCup 2001: The Fifth Robotic Soccer World Championships
- Veloso, Balch, et al.
- 2002
(Show Context)
Citation Context ...occerplaying agents compete against one another using the Soccer Server [10], a real-time soccer simulation system. With our competition team, Brainstormers, we have been participating in the RoboCup =-=[15]-=- championship tournaments for several years, whereupon our main research effort is to realize a growing part of the soccer-playing agent’s behavior by machine learning techniques. The complexity of th... |

2 |
Karlsruhe Brainstromers – A Reinforcement Learning Way to Robotic Soccer II
- Merke, Riedmiller
- 2001
(Show Context)
Citation Context ... robotic soccer competitions 1 , we aim at learning competitive behavior policies. Several research groups have dealt with the task of learning parts of a soccer-playing agent’s behavior autonomously =-=[7, 8]-=-. The focus of this paper is on learning a soccer player’s basic behaviors, its so-called skills. One of the most important fundamental capabilities of a soccer player is to intercept a running ball a... |