Results 1  10
of
1,086
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 439 (6 self)
 Add to MetaCart
. The decomposition, known as the MAXQ decomposition, has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling
The MAXQ Method for Hierarchical Reinforcement Learning
 In Proceedings of the Fifteenth International Conference on Machine Learning
, 1998
"... This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchi ..."
Abstract

Cited by 146 (5 self)
 Add to MetaCart
hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. Conditions under which the MAXQ decomposition can represent the optimal value function are derived. The paper defines a hierarchical Q learning algorithm, proves its
Diversity and Multiplexing: A Fundamental Tradeoff in Multiple Antenna Channels
 IEEE Trans. Inform. Theory
, 2002
"... Multiple antennas can be used for increasing the amount of diversity or the number of degrees of freedom in wireless communication systems. In this paper, we propose the point of view that both types of gains can be simultaneously obtained for a given multiple antenna channel, but there is a fund ..."
Abstract

Cited by 1143 (20 self)
 Add to MetaCart
Multiple antennas can be used for increasing the amount of diversity or the number of degrees of freedom in wireless communication systems. In this paper, we propose the point of view that both types of gains can be simultaneously obtained for a given multiple antenna channel, but there is a fundamental tradeo# between how much of each any coding scheme can get. For the richly scattered Rayleigh fading channel, we give a simple characterization of the optimal tradeo# curve and use it to evaluate the performance of existing multiple antenna schemes.
Synchronization and linearity: an algebra for discrete event systems
, 2001
"... The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific ..."
Abstract

Cited by 369 (11 self)
 Add to MetaCart
The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific community. Copyright Statement This electronic document is in PDF format. One needs Acrobat Reader (available freely for most platforms from the Adobe web site) to benefit from the full interactive machinery: using the package hyperref by Sebastian Rahtz, the table of contents and all LATEX crossreferences are automatically converted into clickable hyperlinks, bookmarks are generated automatically, etc.. So, do not hesitate to click on references to equation or section numbers, on items of thetableofcontents and of the index, etc.. One may freely use and print this document for one’s own purpose or even distribute it freely, but not commercially, provided it is distributed in its entirety and without modifications, including this preface and copyright statement. Any use of thecontents should be acknowledged according to the standard scientific practice. The
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 212 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Simulating ratios of normalizing constants via a simple identity: A theoretical exploration
 Statistica Sinica
, 1996
"... Abstract: Let pi(w),i =1, 2, be two densities with common support where each density is known up to a normalizing constant: pi(w) =qi(w)/ci. We have draws from each density (e.g., via Markov chain Monte Carlo), and we want to use these draws to simulate the ratio of the normalizing constants, c1/c2. ..."
Abstract

Cited by 180 (3 self)
 Add to MetaCart
Abstract: Let pi(w),i =1, 2, be two densities with common support where each density is known up to a normalizing constant: pi(w) =qi(w)/ci. We have draws from each density (e.g., via Markov chain Monte Carlo), and we want to use these draws to simulate the ratio of the normalizing constants, c1/c2. Such a computational problem is often encountered in likelihood and Bayesian inference, and arises in fields such as physics and genetics. Many methods proposed in statistical and other literature (e.g., computational physics) for dealing with this problem are based on various special cases of the following simple identity: c1 c2 = E2[q1(w)α(w)] E1[q2(w)α(w)]. Here Ei denotes the expectation with respect to pi (i =1, 2), and α is an arbitrary function such that the denominator is nonzero. A main purpose of this paper is to provide a theoretical study of the usefulness of this identity, with focus on (asymptotically) optimal and practical choices of α. Using a simple but informative example, we demonstrate that with sensible (not necessarily optimal) choices of α, we can reduce the simulation error by orders of magnitude when compared to the conventional importance sampling method, which corresponds to α =1/q2. We also introduce several generalizations of this identity for handling more complicated settings (e.g., estimating several ratios simultaneously) and pose several open problems that appear to have practical as well as theoretical value. Furthermore, we discuss related theoretical and empirical work.
Results 1  10
of
1,086