• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,983
Next 10 →

A reinforcement procedure leading to correlated equilibrium

by Sergiu Hart, Andreu Mas-colell - In: Debreu, G., Neuefeind, W., Trockel, W. (Eds.), Economic
"... Abstract. We consider repeated games where at any period each player knows only his set of actions and the stream of payoffs that he has received in the past. He knows neither his own payoff function, nor the characteristics of the other players (how many there are, their strategies and payoffs). In ..."
Abstract - Cited by 28 (1 self) - Add to MetaCart
). In this context, we present an adaptive procedure for play — called “modified-regret-matching ” — which is interpretable as a stimulus-response or reinforcement procedure, and which has the property that any limit point of the empirical distribution of play is a correlated equilibrium of the stage game. 1

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

by Thomas G. Dietterich - Journal of Artificial Intelligence Research , 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract - Cited by 443 (6 self) - Add to MetaCart
. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling

Policy gradient methods for reinforcement learning with function approximation.

by Richard S Sutton , David Mcallester , Satinder Singh , Yishay Mansour - In NIPS, , 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract - Cited by 439 (20 self) - Add to MetaCart
Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly

83EUROPEAN JOURNAL OF BEHAVIOR ANALYSIS

by Iver H. Iversen
"... reinforcement procedures ..."
Abstract - Add to MetaCart
reinforcement procedures

CONTINGENT REINFORCEMENT PROCEDURES (TOKEN ECONOMY) ON A "CHRONIC"

by Psychiatric Ward, John M. Atthowe, Leonard Krasner
"... An 86-bed closed ward in a Veterans Administration hospital was used in a 2-yr. study involving the application of a "token economy. " For the patients, labeled chronic schizophrenics or brain damaged, every important phase of ward life was incorporated within a systematic contingency prog ..."
Abstract - Add to MetaCart
. The results at the end of a year indicated a significant increase in the performance of reinforced "desirable " behaviors and a general improvement in patient initiative, responsibility, and social interaction. Although investigators may disagree as to what specific strategies or tactics to pursue

Genotypic Influence on Aversive Conditioning in Honeybees, Using a Novel Thermal Reinforcement Procedure

by Pierre Junca, Julie Carcaud, Sibyle Moulin, Lionel Garnery, Jean-christophe S , 2013
"... In Pavlovian conditioning, animals learn to associate initially neutral stimuli with positive or negative outcomes, leading to appetitive and aversive learning respectively. The honeybee (Apis mellifera) is a prominent invertebrate model for studying both versions of olfactory learning and for unrav ..."
Abstract - Add to MetaCart
In Pavlovian conditioning, animals learn to associate initially neutral stimuli with positive or negative outcomes, leading to appetitive and aversive learning respectively. The honeybee (Apis mellifera) is a prominent invertebrate model for studying both versions of olfactory learning and for unraveling the influence of genotype. As a queen bee mates with about 15 males, her worker offspring belong to as many, genetically-different patrilines. While the genetic dependency of appetitive learning is well established in bees, it is not the case for aversive learning, as a robust protocol was only developed recently. In the original conditioning of the sting extension response (SER), bees learn to associate an odor (conditioned stimulus-CS) with an electric shock (unconditioned stimulus- US). This US is however not a natural stimulus for bees, which may represent a potential caveat for dissecting the genetics underlying aversive learning. We thus first tested heat as a potential new US for SER conditioning. We show that thermal stimulation of several sensory structures on the bee’s body triggers the SER, in a temperature-dependent manner. Moreover, heat applied to the antennae, mouthparts or legs is an efficient US for SER conditioning. Then, using microsatellite analysis, we analyzed heat sensitivity and aversive learning performances in ten worker patrilines issued from a naturally inseminated queen. We demonstrate a strong influence of genotype on aversive learning, possibly indicating the existence of a genetic determinism of this capacity. Such determinism could be instrumental for efficient task partitioning within the hive.

Relative and absolute strength of response as a function of frequency of reinforcement

by R. J. Herrnstein - J. Experimental Analysis Behavior , 1961
"... pigeons behave on a concurrent schedule under which they peck at either of two response-keys. The signifi-cant finding of this investigation was that the relative frequency of responding to each of the keys may be controlled within narrow limits by adjustments in an in-dependent variable. In brief, ..."
Abstract - Cited by 226 (0 self) - Add to MetaCart
, the requirement for rein-forcement in this procedure is the emission of a mini-mum number of pecks to each of the keys. The pigeon receives food when it completes the requirement on both keys. The frequency of responding to each key was a close approximation to the minimum re-quirement. The present experiment

A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli

by John M. Pearce - Psychological Review , 1980
"... Several formal models of excitatory classical conditioning are reviewed. It is suggested that a central problem for all of them is the explanation of cases in which learning does not occur in spite of the fact that the conditioned stimulus is a signal for the reinforcer. We propose a new model that ..."
Abstract - Cited by 290 (11 self) - Add to MetaCart
that deals with this problem by specifying that certain procedures cause a conditioned stimulus (CS) to lose effectiveness; in particular, we argue that a CS will lose associability when its consequences are accurately predicted. In contrast to other current models, the effectiveness of the reinforcer

By carrot or by stick: Cognitive reinforcement learning in parkinsonism

by Michael J. Frank, Lauren C. Seeberger, All C. O’reilly - Science , 2004
"... To what extent do we learn from the positive versus negative outcomes of our decisions? The neuromodulator dopamine plays a key role in these reinforcement learning processes. Patients with Parkinson’s disease, who have depleted dopamine in the basal ganglia, are impaired in tasks that require learn ..."
Abstract - Cited by 177 (25 self) - Add to MetaCart
To what extent do we learn from the positive versus negative outcomes of our decisions? The neuromodulator dopamine plays a key role in these reinforcement learning processes. Patients with Parkinson’s disease, who have depleted dopamine in the basal ganglia, are impaired in tasks that require

The MAXQ Method for Hierarchical Reinforcement Learning

by Thomas G. Dietterich - In Proceedings of the Fifteenth International Conference on Machine Learning , 1998
"... This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchi ..."
Abstract - Cited by 146 (5 self) - Add to MetaCart
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a
Next 10 →
Results 1 - 10 of 1,983
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University