## Maximizing Reward in a Non-Stationary Mobile Robot Environment (2002)

Venue: | Autonomous Agents and Multi-Agent Systems |

Citations: | 19 - 0 self |

### BibTeX

@ARTICLE{Goldberg02maximizingreward,

author = {Dani Goldberg and Maja J Mataric},

title = {Maximizing Reward in a Non-Stationary Mobile Robot Environment},

journal = {Autonomous Agents and Multi-Agent Systems},

year = {2002},

volume = {6},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

The ability of a robot to improve its performance on a task can be critical, especially in poorly known and non-stationary environments where the best action or strategy is dependent upon the current state of the environment. In such systems, a good estimate of the current state of the environment is key to establishing high performance, however quantified. In this paper, we present an approach to state estimation in poorly known and non-stationary mobile robot environments, focusing on its application to a mine collection scenario where performance is quantified using reward maximization. The approach is based on the use of augmented Markov models (AMMs), a sub-class of semi-Markov processes. We have developed an algorithm for incrementally constructing arbitrary-order AMMs online. It is used to capture the interaction dynamics between a robot and its environment in terms of behavior sequences executed during the performance of a task. For the purposes of reward maximization in a non-stationary environment, multiple AMMs monitor events at different timescales and provide statistics used to select the AMM likely to have a good estimate of the environmental state. AMMs with redundant or outdated information are discarded, while attempting to maintain sucient data to reduce conformation to noise. This approach has been successfully implemented on a mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization performance criterion. We then incorporate our algorithm for state estimation using multiple AMMs, allowing the robot to select appropriate actions based on the estimated state of the environment. The approach is tested first with a physical robot, in a non-stationary environment...

### Citations

9281 | Introduction to Algorithms - Cormen, Leiserson, et al. |

1072 |
Behavior-Based Robotics
- Arkin
- 1998
(Show Context)
Citation Context ...ony pan-tilt-zoom camera on the left, and eectors by the Sarcos Dextrous Arm on the right.) proven to be an eective paradigm for developing single-robot and multi-robot controllers (Mataric, 1997; Ark=-=in, 1998-=-). AMMs with BBC are the synergistic combination integral to the work in this paper. AMMs provide the ability for on-line, real-time model construction in higher-order Markovian systems, while BBC pro... |

915 | Planning and acting in partially observable stochastic domains
- Kaelbling, Littman
- 1998
(Show Context)
Citation Context ...ons, or a reward signal to indicate the value of an action taken in a particular state. It also does not capture partial observability, as does a partially observable Markov decision process (POMDP) (=-=Kaelbling et al., 1998-=-). In an AMM, the state of the world is assumed to be known, with one exception: the AMM construction algorithm does not decide a priori on the structure (order) of the Markov model best capable of ca... |

864 | Intelligence without reason
- Brooks
- 1991
(Show Context)
Citation Context ... control (BBC) as a substrate for AMM construction provides the remaining representational expressiveness. Behavior-based control (BBC) is a paradigm for constructing controllers for situated agents (=-=Brooks, 199-=-1; Mataric, 1992). In BBC, a controller is organized as a collection of processing modules, called behaviors, that receive input from sensors and/or other behaviors, process the input (possibly modify... |

472 | MDPs and Semi-MDPs: A Framework for Temporal Abstraction
- Sutton, Precup, et al.
- 1999
(Show Context)
Citation Context ...oldberg and Mataric industrial manufacturing. The goal is to optimize production using reinforcement learning. Other work has also used such hybrid SMP/MDP models, or semi-Markov decision processes (S=-=utton et al., 1999-=-; Wang and Mahadevan, 1999), as well as dynamical systems approaches (Beer, 1993; Smithers, 1995) to model the interaction between an agent (robot) and its environment. The basic structure of augmente... |

400 |
A Tutorial on
- Rabiner
- 1989
(Show Context)
Citation Context ...er, 1993; Smithers, 1995) to model the interaction between an agent (robot) and its environment. The basic structure of augmented Markov models is very similar to that of hidden Markov models (HMMs) (=-=Rabiner, 198-=-9). The dierence is that in an AMM, there is only one observation symbol per state, as opposed to a probability distribution over observation symbols in an HMM. In addition, an AMM assumes that the st... |

336 | Cooperative mobile robotics: Antecedents and directions - Cao, Fukunaga, et al. - 1995 |

293 | Reinforcement Learning with Selective Perception and Hidden State - McCallum - 1995 |

266 |
Applied Probability Models with Optimization Applications
- Ross
- 1970
(Show Context)
Citation Context ...ace adhering to the following property: PfXm+1 = j j Xm = i; Xm 1 = i m 1 ; : : : ; X 2 = i 2 ; X 1 = i 1 g = PfXm+1 = j j Xm = ig; (1) for all states i 1 ; i 2 ; : : : ; i m 1 ; i; j, and all m 1 (R=-=oss, 1992-=-). In other words, the probability that the next state Xm+1 is j, given the current state (Xm = i) and any past state (X 1 = i 1 ; : : : ; Xm 1 = i m 1 ), is dependent only upon the current state i. I... |

228 |
A dynamical systems perspective on agent-environment interactions
- Beer
- 1995
(Show Context)
Citation Context ...einforcement learning. Other work has also used such hybrid SMP/MDP models, or semi-Markov decision processes (Sutton et al., 1999; Wang and Mahadevan, 1999), as well as dynamical systems approaches (=-=Beer, 1993-=-; Smithers, 1995) to model the interaction between an agent (robot) and its environment. The basic structure of augmented Markov models is very similar to that of hidden Markov models (HMMs) (Rabiner,... |

202 | Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach - Chrisman - 1992 |

200 | Behavior-based control: Examples from navigaton, learning, and group behavior - Matarić - 1997 |

190 | Acting under uncertainty: Discrete bayesian models for mobile robot navigation - Cassandra, Kaelbling, et al. - 1996 |

183 | On Three-Layer Architectures
- Gat
- 1998
(Show Context)
Citation Context ...he large majority of current approaches to robot control, ranging from hybrid to behavior-based, utilize sequences and priorities over the actions and behaviors executed on the robot (Mataric, 1997; G=-=at, 1998-=-). As long as the behavior of the robot can be decomposed into a behavior space, AMMs apply as a modeling tool. In the next section, we describe how to derive AMM statistics used later in our moving a... |

172 | Learning hidden Markov model structure for Information Extraction - Seymore, McCallum, et al. - 1999 |

140 | Hidden Markov Model Induction by Bayesian Model Merging - Stolcke, Omohundro - 1992 |

109 | Heterogeneous Multi-Robot Cooperation
- Parker
- 1994
(Show Context)
Citation Context ...t on constructing the controller. Much research has also been conducted on the performance and properties of robot collection tasks similar to the one we use (Arkin et al., 1993; Arkin and Ali, 1994; =-=Parker, -=-1994; Fontan and Mataric, 1998; Balch, 2000). The reader is encouraged to see Cao et al. (1997) and Mataric (1995) for a more complete set of references from Articial Intelligence, Robotics, Distribut... |

103 | The Behavior Language; User's Guide
- Brooks
- 1990
(Show Context)
Citation Context ...lor sensor in the gripper, a radio transmitter/receiver for communication and data gathering, and an ultrasound triangulation system for positioning. The robot is programmed in the Behavior Language (=-=Brooks, 19-=-90). Experiments were performed in an 11 14 foot rectangular enclosure (the Corrall). The Corrall had up to 36 small plastic cylinders (pucks) of two dierent colors: clear (representing large mines) ... |

68 | Sequential Changepoint Detection in Quality Control and Dynamical Systems - Lai - 1995 |

66 | Automated robot behavior recognition applied to robotic soccer - Han, Veloso - 1999 |

63 |
Discrete Mathematical Models with Applications to Social, Biological, and Environmental Problems
- Roberts
- 1976
(Show Context)
Citation Context ...e algorithm using multiple AMMs. One such statistic is the expected number of time steps the system takes to reach a destination state from a given start state. This is known as the meansrst passage (=-=Roberts, 1976-=-). Markov chain theory provides tools for easily calculating such expectations. We apply these tools to AMMs, then use the results to calculate two other statistics: the total variance associated with... |

61 |
Communication of behavioral state in multi-agent retrieval tasks
- Arkin, Balch, et al.
- 1993
(Show Context)
Citation Context ... robot interacts with its environment, not on constructing the controller. Much research has also been conducted on the performance and properties of robot collection tasks similar to the one we use (=-=Arkin et al., -=-1993; Arkin and Ali, 1994; Parker, 1994; Fontan and Mataric, 1998; Balch, 2000). The reader is encouraged to see Cao et al. (1997) and Mataric (1995) for a more complete set of references from Articia... |

59 | Hierarchic social entropy: An information theoretic measure of robot group diversity - Balch - 2000 |

56 |
Mathematical Statistics
- Freund
- 1992
(Show Context)
Citation Context ...inomial condence intervals (step 6). Given asxed amount of data, there is a tradeo between incorrectly splitting states (Type I error) and not splitting states that should be split (Type II error) (Fr=-=eund, 1992-=-). In the extreme, being 100% certain that no erroneous state splitting occurs means performing no state splitting at all. The computational complexity of the algorithm per input symbol is at most O(N... |

52 | Multiple Objective Action Selection & Behaviour Fusion Using Voting
- Pirjanian
- 1998
(Show Context)
Citation Context ...ronously and in parallel, simultaneously receiving input and producing output. An action selection mechanism prevents con icts when signals are simultaneously sent to the same actuators or behaviors (=-=Pirjanian, 1998-=-). Behavior-based control has aa-journal00-final.tex; 19/03/2002; 11:42; p.10 Reward Maximization in a Non-Stationary Mobile Robot Environment 11 sensors effectors state processing behaviors behaviors... |

51 |
Informal Networks
- Krackhardt, Hanson
(Show Context)
Citation Context ...used in updating mean and variance estimates on the weights of feed-forward neural networks. Meiosis Networks use these mean and variance estimates to decide when to split states in the hidden layer (=-=Hanson, 1990-=-). A complementary notion is that of state merging, explored by Stolcke and Omohundro (1993) for learning the number of states and topology of an HMM. The approach begins by constructing a specialized... |

50 | Integration of Reactive and Telerobotic Control in Multi-agent Robotic Systems
- Arkin
(Show Context)
Citation Context ...h its environment, not on constructing the controller. Much research has also been conducted on the performance and properties of robot collection tasks similar to the one we use (Arkin et al., 1993; =-=Arkin and Ali, -=-1994; Parker, 1994; Fontan and Mataric, 1998; Balch, 2000). The reader is encouraged to see Cao et al. (1997) and Mataric (1995) for a more complete set of references from Articial Intelligence, Robot... |

45 | Estimating the current mean of a normal distribution which is subjected - Chernoff, Zacks - 1964 |

43 | Territorial multi-robot task division - Fontan, Mataric - 1998 |

38 | Discrete event systems for autonomous mobile agents - Kosecka, Bajczy - 1993 |

17 | Behavior-based systems: Key properties and implications - Matarić - 1992 |

12 | Issues and Approaches - Matarić - 1995 |

10 | Evaluating the Dynamics of Agent-Environment Interaction - Goldberg - 2001 |

10 | Optimizing Production Manufacturing using Reinforcement Learning - Mahadevan, Theocharous - 1998 |

8 |
What the dynamics of adaptive behavior and cognition might look like in agent–environment interaction systems
- Smithers
- 1994
(Show Context)
Citation Context ... learning. Other work has also used such hybrid SMP/MDP models, or semi-Markov decision processes (Sutton et al., 1999; Wang and Mahadevan, 1999), as well as dynamical systems approaches (Beer, 1993; =-=Smithers, 199-=-5) to model the interaction between an agent (robot) and its environment. The basic structure of augmented Markov models is very similar to that of hidden Markov models (HMMs) (Rabiner, 1989). The die... |

5 | A Normal Approximations for Binomial - Pratt - 1968 |

3 |
Approximate binomial con limits
- Blyth
- 1986
(Show Context)
Citation Context ... n; that have t n;2 = s c and t n;3:n+1 equal to the previous n 1 states. Based on this total, calculate the transition probabilities for s c and their associated binomial condence intervals (Blyth=-=, 198-=-6). If the actual number of nth-order traversals does not fall outside of the condence intervals, then there is no nth-order inconsistency in the model. If all orders have been checked, go to step 7. ... |

1 | Mataric: 1999, `Coordinating Mobile Robot Group Behavior Using a Model of Interaction Dynamics - Goldberg, J |

1 | Mataric: 2001, `Detecting Regime Changes with a Mobile Robot using Multiple Models - Goldberg, J |

1 | Denumerable Markov Chains. D. Van Nostrand Company, Inc. aa-journal00-final.tex; 19/03/2002; 11:42; p.30 Reward Maximization in a Non-Stationary Mobile Robot Environment 31 - Kemeny, Snell, et al. - 1966 |

1 | A Study of the Accuracy of Some Approximations for t - Ling - 1978 |

1 | Mataric: 1998, `Learning from History for Behavior-Based Mobile Robots in Non-stationary Conditions'. Autonomous Robots - Michaud, J |

1 | Numerical Recipes in C: The Art of Scienti Computing - thesis - 1992 |