• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning to solve multiple goals (1997)

by J Karlsson
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

Eye Movements for Reward Maximization

by Nathan Sprague, Dana Ballard - In Advances in Neural Information Processing Systems 15 , 2003
"... Recent eye tracking studies in natural tasks suggest that there is a tight link between eye movements and goal directed motor actions. However, most existing models of human eye movements provide a bottom up account that relates visual attention to attributes of the visual scene. The purpose of ..."
Abstract - Cited by 18 (3 self) - Add to MetaCart
Recent eye tracking studies in natural tasks suggest that there is a tight link between eye movements and goal directed motor actions. However, most existing models of human eye movements provide a bottom up account that relates visual attention to attributes of the visual scene. The purpose of this paper is to introduce a new model of human eye movements that directly ties eye movements to the ongoing demands of behavior.

Multiple-goal reinforcement learning with modular sarsa(O

by Nathan Sprague, Dana Ballard , 2003
"... rochester edu We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement ..."
Abstract - Cited by 11 (6 self) - Add to MetaCart
rochester edu We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by first finding optimal policies for the component MDPs, and then merging these into a policy for the composite task. The problem with such methods is that policies that are optimized separately may or may not perform well when they are merged into a composite solution. Instead of searching for optimal policies for the component MDPs in isolation, our approach finds good policies in the context of the composite task.

Modeling embodied visual behaviors

by Nathan Sprague, Dana Ballard, Al Robinson - ACM Trans. Appl. Percpt , 2007
"... To make progess in understanding human visuo-motor behavior, we will need to understand its basic components at an abstract level. One way to achieve such an understanding would be to create a model of a human that has a sufficient amount of complexity so as to be capable of generating such behavior ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
To make progess in understanding human visuo-motor behavior, we will need to understand its basic components at an abstract level. One way to achieve such an understanding would be to create a model of a human that has a sufficient amount of complexity so as to be capable of generating such behaviors. Recent technological advances have been made that allow progress to be made in this direction. Graphics models that simulate extensive human capabilities can be used as platforms from which to develop synthetic models of visuo-motor behavior. Currently such models can capture only a small portion of a full behavioral repertoire, but for the behaviors that they do model, they can describe complete visuo-motor subsystems at a useful level of detail. The value in doing so is that the body’s elaborate visuo-motor structures greatly simplify the specification of the abstract behaviors that guide them. The net result is that, essentially, one is behaviors at each instant. This paper outlines one such model. A centerpiece of the model uses vision to aid the behavior that has the most to gain from taking environmental measurements. Preliminary tests of the model against human performance in realistic VR environments show that main features of the model show up in human behavior. Categories and Subject Descriptors: I.2.10 [Vision and Scene Understanding]: Perceptual reasoning 1.

Balancing Multiple Sources of Reward in Reinforcement Learning

by Christian R. Shelton - Neural Information Processing Systems-2000 , 2000
"... For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examples of such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by combining the mul ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examples of such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by combining the multiple components can throw away vital information and can lead to incorrect solutions. We describe the multiple reward source problem and discuss the problems with applying traditional reinforcement learning. We then present an new algorithm for finding a solution and results on simulated environments. 1 Introduction In the traditional reinforcement learning framework, the learning agent is given a single scalar value of reward at each time step. The goal is for the agent to optimize the sum of these rewards over time (the return). For many applications, there is more information available. Consider the case of a home entertainment system designed to sense which residents are currentl...

Strength or Accuracy? - Fitness calculation in learning classifier systems.

by Tim Kovacs - Learning Classifier Systems: An Introduction to Contemporary Research , 2000
"... Wilson's XCS is a clear departure from earlier classifier systems in terms of the way it calculates the fitness of classifiers for use in the genetic algorithm. Despite the growing body of work on XCS and the advantages claimed for it there has been no detailed comparison of XCS and traditional stre ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
Wilson's XCS is a clear departure from earlier classifier systems in terms of the way it calculates the fitness of classifiers for use in the genetic algorithm. Despite the growing body of work on XCS and the advantages claimed for it there has been no detailed comparison of XCS and traditional strength-based systems. This work takes a step towards rectifying this situation by surveying a number of issues related to the change in fitness. I distinguish different definitions of overgenerality for strength and accuracy-based fitness and analyse some implications of the use of accuracy, including an apparent advantage in addressing the explore/exploit problem. I analyse the formation of strong overgenerals, a major problem for strength-based systems, and illustrate their dependence on biased reward functions. I consider motivations for biasing reward functions in single step environments, and show that non-trivial multi step environments have biased Q-functions. I conclude that XC...

The World-Wide-Mind: Draft Proposal

by Mark Humphrys, Mark Humphrys , 2001
"... In the first part of this paper, a change in methodology for the future of AI and Adaptive Behavior research is proposed. It is proposed that researchers construct their agent minds and their agent worlds as servers on the Internet. 3rd parties will use these servers as components in larger systems. ..."
Abstract - Cited by 6 (6 self) - Add to MetaCart
In the first part of this paper, a change in methodology for the future of AI and Adaptive Behavior research is proposed. It is proposed that researchers construct their agent minds and their agent worlds as servers on the Internet. 3rd parties will use these servers as components in larger systems. In this scheme, any user on the Internet will be able to (a) select multiple minds from different remote "mind servers", (b) select a remote "Action Selection server" to resolve the (inevitable) conflicts between these minds, and (c) run the resulting constructed "society of mind" in the world provided on another "world server". All this without necessarily having to consult with the server authors. This constructed society may now also be presented as just another primitive mind server, ready for reuse by others as a component in a larger system. From the current situation of isolated experiments we will move to a situation where not only can researchers use each other's agent worlds, but they can also use each other's agent minds as components in larger systems. Servers may call other servers, and it is expected that 3rd parties will continuously write wrappers and filters for existing mind servers, overriding and modifying their default behaviour (to produce new, co-existing mind servers). None of this necessarily means that the mind being used ever leaves its server (or that its insides are even made public). Hence the term, the "World-Wide-Mind" (WWM), referring to the fact that the mind may be physically distributed across the world, with parts of the mind at different remote servers. Part of the motivation for the WWM is that if the AI project is to be successful, it may be too big for any single laboratory to complete. So it will be necessary both to decentralise t...

On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming

by Sooraj Bhat, et al. , 2006
"... In recent years there has been a great deal of interest in “modular reinforcement learning” (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents’ preferences to select an action. In this work, w ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
In recent years there has been a great deal of interest in “modular reinforcement learning” (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents’ preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A²BL, a language that encapsulates many of these ideas.

Modeling attention with embodied visual behaviors

by Nathan Sprague, Dana Ballard, Al Robinson - 2005, http://www.cs.rochester.edu/ ∼ dana/WalterTheory25.pdf
"... Most experimental investigations of visual attention essentially measure a subject’s differential performance with respect to an attended condition and a control. This design makes it difficult to integrate the results of different attentional studies, as they are typically measured under very diffe ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Most experimental investigations of visual attention essentially measure a subject’s differential performance with respect to an attended condition and a control. This design makes it difficult to integrate the results of different attentional studies, as they are typically measured under very different experimental conditions. One way of accomplishing such an integration is to create a model of a human that has a useful amount of complexity. Essentially, one is faced with proposing an embodied “operating system ” model that can be tested against human performance. Recently technological advances have been made that allow progress to be made in this direction. Graphics models that simulate extensive human capabilities can be used as platforms from which to develop synthetic models of visuo-motor behavior. Currently such models can capture only a small portion of a full behavioral repertoire, but for the behaviors that they do model, they can describe complete visuo-motor subsystems at a level of detail that can be tested against human performance in realistic environments. This paper outlines one such model and shows both that it can produce interesting new hypotheses as to the role of vision and also that it can enhance our understanding of visual attention.

Constructing Complex Minds Through Multiple Authors

by Mark Humphrys Ciarn, Mark Humphrys - In Proceedings of the Seventh International Conference on Simulation of Adaptive Behavior , 2002
"... The World-Wide-Mind (WWM) was introduced in [Humphrys, 2001]. For a short introduction see [Humphrys, 2001a]. Briefly, this is a scheme for putting animat "minds" online (as WWM "servers") so that large complex minds may be constructed from many remote components. The aim is to address the scalin ..."
Abstract - Add to MetaCart
The World-Wide-Mind (WWM) was introduced in [Humphrys, 2001]. For a short introduction see [Humphrys, 2001a]. Briefly, this is a scheme for putting animat "minds" online (as WWM "servers") so that large complex minds may be constructed from many remote components. The aim is to address the scaling up of animat research, or how to construct minds more complex than could be written by one author (or one research group).

Reinforcement Learning for Several Environments -- Theory and Applications

by Andreas Matt, Georg Regensburger , 2004
"... ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University