Results 1  10
of
11
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
"... In cellular telephone systems, an important problem is to dynamically allocate the communication resource (channels) so as to maximize service in a stochastic caller environment. This problem is naturally formulated as a dynamic programming problem and we use a reinforcement learning (RL) method to ..."
Abstract

Cited by 141 (6 self)
 Add to MetaCart
(Show Context)
In cellular telephone systems, an important problem is to dynamically allocate the communication resource (channels) so as to maximize service in a stochastic caller environment. This problem is naturally formulated as a dynamic programming problem and we use a reinforcement learning (RL) method to find dynamic channel allocation policies that are better than previous heuristic solutions. The policies obtained perform well for a broad variety of call traffic patterns. We present results on a large cellular system with approximately 49^49 states.
Capacity of the Trapdoor Channel with Feedback
"... We establish that the feedback capacity of the trapdoor channel is the logarithm of the golden ratio and provide a simple communication scheme that achieves capacity. As part of the analysis, we formulate a class of dynamic programs that characterize capacities of unifilar finitestate channels. The ..."
Abstract

Cited by 40 (15 self)
 Add to MetaCart
(Show Context)
We establish that the feedback capacity of the trapdoor channel is the logarithm of the golden ratio and provide a simple communication scheme that achieves capacity. As part of the analysis, we formulate a class of dynamic programs that characterize capacities of unifilar finitestate channels. The trapdoor channel is an instance that admits a simple analytic solution.
A Comparison of Discrete and Parametric Approximation Methods for ContinuousState Dynamic Programming Problems
, 2000
"... We compare alternative numerical methods for approximating solutions to continuousstate dynamic programming (DP) problems. We distinguish two approaches: discrete approximation and parametric approximation. In the former, the continuous state space is discretized into a finite number of points N , ..."
Abstract

Cited by 23 (11 self)
 Add to MetaCart
We compare alternative numerical methods for approximating solutions to continuousstate dynamic programming (DP) problems. We distinguish two approaches: discrete approximation and parametric approximation. In the former, the continuous state space is discretized into a finite number of points N , and the resulting finitestate DP problem is solved numerically. In the latter, a function associated with the DP problem such as the value function, the policy function, or some other related function is approximated by a smooth function of K unknown parameters. Values of the parameters are chosen so that the parametric function approximates the true function as closely as possible. We focus on approximations that are linear in parameters, i.e. where the parametric approximation is a linear combination of K basis functions. We also focus on methods that approximate the value function V as the solution to the Bellman equation associated with the DP problem. In finite state DP problems...
A QoSAware Reinforcement Learning based MAC Protocol for Wireless Sensor Networks
 in Proc. of Intl. Conference on Networking, Sensing, and Control
, 2006
"... ..."
(Show Context)
On Queueing and MultiLayer Coding
, 2004
"... A singleserver queue concatenated with a multilevel channel encoder is considered. The main focus of this work is on minimization of the average delay of a packet from entering the queue until completion of successful service. Tight bounds are derived for the average delay for different numbers of ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A singleserver queue concatenated with a multilevel channel encoder is considered. The main focus of this work is on minimization of the average delay of a packet from entering the queue until completion of successful service. Tight bounds are derived for the average delay for different numbers of coded layers. Numerical optimization is applied to find the optimal resource allocation minimizing the average delay. Delay bounds are also derived for continuous layering (single user broadcast approach). The optimizing power distribution of the minimal delay is approximated, and numerically evaluated. It is demonstrated that code layering may give pronounced performance gains in terms of delay, which are more impressive than those associated with throughput. This makes layering more attractive when communicating under stringent delay constraints.
How to Make Software Agents Do the Right Thing: An Introduction to Reinforcement Learning
 Adaptive Systems Group, Harlequin Inc
, 1996
"... This article explains why programming agents is not just businessasusual; rather it requires a new way of looking at problems and their solutions. When you hire a human agent to do something for you, you rarely spell out a detailed plan of action. Instead, you define the state of the environment t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This article explains why programming agents is not just businessasusual; rather it requires a new way of looking at problems and their solutions. When you hire a human agent to do something for you, you rarely spell out a detailed plan of action. Instead, you define the state of the environment that you want to achieve (e.g., you tell a contractor that you want a new front porch with comfortable seating, for under $2000). In more complex and uncertain situations, you specify your preferences rather than stating outright goals, as when you tell a stock broker agent that the more money you make the better, but count capital gains as, say, 30% better than dividend income. Your hired agent then takes actions on your behalf, even negotiates with other agents, all to help you achieve your preferences. We would like our software agents to behave the same way. That means we will need a way to describe our preferences to software agents, and a methodology for building agents that best satisfy our preferences. The pleasant surprise is that for many problems, once we know the preferences, we're almost done! Given the preferences, a list of possible actions, and enough time to practice taking actions, we can apply the formalism of Reinforcement Learning (or RL) to build an agent that acts according to the preferences in a nearoptimal way. This article shows how. 2 The Elevator Problem
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 BatteryHealth Conscious Power Management in PlugIn Hybrid
"... Abstract — This paper develops techniques to design plugin hybrid electric vehicle (PHEV) power management algorithms that optimally balance lithiumion battery pack health and energy consumption cost. As such, this research is the first to utilize electrochemical battery models to optimize the pow ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — This paper develops techniques to design plugin hybrid electric vehicle (PHEV) power management algorithms that optimally balance lithiumion battery pack health and energy consumption cost. As such, this research is the first to utilize electrochemical battery models to optimize the power management in PHEVs. Daily trip length distributions are integrated into the problem using Markov chains with absorbing states. We capture battery aging by integrating two example degradation models: solid–electrolyte interphase (SEI) film formation and the “Ahprocessed ” model. This enables us to optimally tradeoff energy cost versus batteryhealth. We analyze this tradeoff to explore how optimal control strategies and physical battery system properties are related. Specifically, we find that the slope and convexity properties of the health degradation model profoundly impact the optimal charge depletion strategy. For example, solutions that balance energy cost and SEI layer growth aggressively deplete battery charge at high statesofcharge (SoCs), then blend engine and battery power at lower SoCs. Index Terms — Batteries, electrochemical modeling, optimal control, plugin hybrid vehicles, power management, stochastic
c © 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. KernelBased Reinforcement Learning
"... Abstract. We present a kernelbased approach to reinforcement learning that overcomes the stability problems of temporaldifference learning in continuous statespaces. First, our algorithm converges to a unique solution of an approximate Bellman’s equation regardless of its initialization values. S ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We present a kernelbased approach to reinforcement learning that overcomes the stability problems of temporaldifference learning in continuous statespaces. First, our algorithm converges to a unique solution of an approximate Bellman’s equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernelbased approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the biasvariance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or nonparametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.
Preliminary and Incomplete. Comments Welcome
, 2000
"... We compare alternative numerical methods for approximating solutions to continuousstate dynamic programming (DP) problems. We distinguish two approaches: discrete approximation and parametric approximation. In the former, the continuous state space is discretized into a finite number of points N, a ..."
Abstract
 Add to MetaCart
We compare alternative numerical methods for approximating solutions to continuousstate dynamic programming (DP) problems. We distinguish two approaches: discrete approximation and parametric approximation. In the former, the continuous state space is discretized into a finite number of points N, and the resulting finitestate DP problem is solved numerically. In the latter, a function associated with the DP problem such as the value function, the policy function, or some other related function is approximated by a smooth function of K unknown parameters. Values of the parameters are chosen so that the parametric function approximates the true function as closely as possible. We focus on approximations that are linear in parameters, i.e. where the parametric approximation is a linear combination of K basis functions. We also focus on methods that approximate the value function V as the solution to the Bellman equation associated with the DP problem. In finite state DP problems the method of policy iteration is an effective iterative method for solving the Bellman equation that converges to V in a finite number of steps. Each iteration involves a policy valuation step that computes the value function Vα corresponding to a trial policy α. We show how policy iteration can be extended to continuousstate DP problems. For discrete approximation, we refer to the resulting algorithm as discrete policy iteration (DPI). Each