Results 1  10
of
46
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 229 (24 self)
 Add to MetaCart
(Show Context)
A preliminary unedited version of this paper was incorrectly published as part of Volume
Reinforcement Learning for LongRun Average Cost
, 2004
"... A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from thecurs of dimensORPOcG ( and breakdea rapidx in face of large statespaces. In addition,
A new QoS provisioning method for adaptive multimedia in cellular wireless networks
 in: Proc. IEEE Infocom’04, HongKong
, 2004
"... Abstract—Future wireless networks are designed to support adaptive multimedia by controlling individual ongoing flows to increase or decrease their bandwidths in response to changes in traffic load. There is growing interest in qualityofservice (QoS) provisioning under this adaptive multimedia fra ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Future wireless networks are designed to support adaptive multimedia by controlling individual ongoing flows to increase or decrease their bandwidths in response to changes in traffic load. There is growing interest in qualityofservice (QoS) provisioning under this adaptive multimedia framework, in which a bandwidth adaptation algorithm needs to be used in conjunction with the call admission control algorithm. This paper presents a novel method for QoS provisioning via average reward reinforcement learning in conjunction with stochastic approximation, which can maximize the network revenue subject to several predetermined QoS constraints. Unlike other modelbased algorithms (e.g., linear programming), our scheme does not require explicit state transition probabilities, and therefore, the assumptions behind the underlying system model are more realistic than those in previous schemes. In addition, when we consider the status of neighboring cells, the proposed scheme can dynamically adapt to changes in traffic condition. Moreover, the algorithm can control the bandwidth adaptation frequency effectively by accounting for the cost of bandwidth switching in the model. The effectiveness of the proposed approach is demonstrated using simulation results in adaptive multimedia wireless networks. Index Terms—Adaptive multimedia, QoS, reinforcement learning, wireless networks.
Optimal Admission Control Using Handover Prediction in Mobile Cellular Networks
 IN PROCEEDINGS OF THE 2ND INTERNATIONAL WORKING CONFERENCE ON PERFORMANCE MODELLING AND EVALUATION OF HETEROGENEOUS NETWORKS (HETNETS’04
, 2004
"... In this paper we study the impact of incorporating handover prediction information into the call admission control process in mobile cellular networks. The comparison is done between the performance of optimal policies obtained with and without the predictive information. The prediction agent clas ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we study the impact of incorporating handover prediction information into the call admission control process in mobile cellular networks. The comparison is done between the performance of optimal policies obtained with and without the predictive information. The prediction agent classifies mobile users in the neighborhood of a cell into two classes, those that will probably be handed over into the cell and those that probably will not. We consider the classification error by modeling the falsepositive and nondetection probabilities. Two different approaches to compute the optimal admission policy were studied: dynamic programming and reinforcement learning. Preliminary results show significant performance gains when the predictive information is used in the admission process.
A Large Scale Simulation Model of Pandemic Influenza Outbreaks for Development of Dynamic Mitigation Strategies
, 2007
"... Limited stockpiles of vaccine and antiviral drugs and other resources pose a formidable healthcare delivery challenge for an impending humantohuman transmittable influenza pandemic. The existing preparedness plans by the Center for Disease Control and Health and Human Services strongly underscore ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Limited stockpiles of vaccine and antiviral drugs and other resources pose a formidable healthcare delivery challenge for an impending humantohuman transmittable influenza pandemic. The existing preparedness plans by the Center for Disease Control and Health and Human Services strongly underscore the need for efficient mitigation strategies. Such a strategy entails decisions for early response, vaccination, prophylaxis, hospitalization, and quarantine enforcement. This paper presents a large scale simulation model that mimics stochastic propagation of influenza pandemic controlled by mitigation strategies. Impact of a pandemic is assessed via measures including total numbers of infected, dead, denied hospital admission, and denied vaccine/antiviral drugs, and also through an aggregate cost measure incorporating healthcare cost and lost wages. The model considers numerous demographic and community features, daily human activities, vaccination, prophylaxis, hospitalization, social distancing, and hourly accounting of infection spread. The simulation model can serve as the foundation for developing dynamic mitigation strategies. The simulation model is tested on a hypothetical community with over 1.1 million people. A designed experiment is conducted to examine the statistical significance of a number of model
A Novel Scheduling Algorithm for Video Traffic in HighRate WPANs
"... Abstract — The emerging highrate wireless personal area network (WPAN) technology is capable of supporting highspeed and highquality realtime multimedia applications. In particular, MPEG4 video streams are deemed to be a widespread traffic type. However, in the current IEEE 802.15.3 standard fo ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract — The emerging highrate wireless personal area network (WPAN) technology is capable of supporting highspeed and highquality realtime multimedia applications. In particular, MPEG4 video streams are deemed to be a widespread traffic type. However, in the current IEEE 802.15.3 standard for media access control (MAC) of highrate WPANs, the implementation details of some key issues such as scheduling and quality of service (QoS) provisioning have not been addressed. In this paper, we first propose a mathematical model for the optimal scheduling scheme for MPEG4 flows in highrate WPANs. We also propose an RL scheduler based on the reinforcement learning (RL) technique. Simulation results show that our proposed RL scheduler achieves nearly optimal performance and performs better than FSRPT [1], EDD+SRPT [2], and PAP [3] scheduling algorithms in terms of a lower decoding failure rate. I.
An afterstates reinforcement learning approach to optimize admission control in mobile cellular networks
 Lecture Notes in Computer Science: Wireless Systems and Network Architectures in Next Generation Internet
, 2006
"... Abstract. We deploy a novel Reinforcement Learning optimization technique based on afterstates learning to determine the gain that can be achieved by incorporating movement prediction information in the session admission control process in mobile cellular networks. The novel technique is able to ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We deploy a novel Reinforcement Learning optimization technique based on afterstates learning to determine the gain that can be achieved by incorporating movement prediction information in the session admission control process in mobile cellular networks. The novel technique is able to find better solutions and with less dispersion. The gain is obtained by evaluating the performance of optimal policies achieved with and without the predictive information, while taking into account possible prediction errors. The prediction agent is able to determine the handover instants both stochastically and deterministically. Numerical results show significant performance gains when the predictive information is used in the admission process, and that higher gains are obtained when deterministic handover instants can be determined. 1
Inference strategies for solving semiMarkov decision processes
"... SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation giv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation gives us an intuitive way to reason about actions where it is also necessary to take into account how long these actions will take to perform. Formally we can define an SMDP as a continuoustime controlled stochastic process (x(t), u(t)) consisting, respectively, of states and actions at every point in time t where state transitions occur at random arrival times Tn. In particular, the process is stationary in between jumps, i.e. x(t) = xn and u(t) = un
A Policy Gradient Method for SMDPs with Application to Call Admission Control
, 2002
"... Classical methods for solving a semiMarkov decision process such as value iteration and policy iteration require precise knowledge of the underlying probabilistic model and are known to suffer from the curse of dimensionality. To overcome both these limitations, this paper presents a reinforcement ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Classical methods for solving a semiMarkov decision process such as value iteration and policy iteration require precise knowledge of the underlying probabilistic model and are known to suffer from the curse of dimensionality. To overcome both these limitations, this paper presents a reinforcement learning approach where one optimizes directly the performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it through stochastic approximation. The gradient estimator is based on the discounted score method as introduced in [1]. We demonstrate the utility of our algorithm in a Call Admission Control problem.