Results 1  10
of
122
Estimation and Approximation Bounds for GradientBased Reinforcement Learning
 In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
, 2000
"... We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥ § ), and focus on gradient ascent approaches to this problem. In [3] ¢¡¤£¦¥§ we introduced, an algorithm for ¨ estimating the performance gradient of a ©¡¤£¦¥¤ ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥ § ), and focus on gradient ascent approaches to this problem. In [3] ¢¡¤£¦¥§ we introduced, an algorithm for ¨ estimating the performance gradient of a ©¡¤£¦¥¤
Reward Design via Online Gradient Ascent
"... Recent work has demonstrated that when artificial agents are limited in their ability to achieve their goals, the agent designer can benefit by making the agent’s goals different from the designer’s. This gives rise to the optimization problem of designing the artificial agent’s goals—in the RL fram ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
framework, designing the agent’s reward function. Existing attempts at solving this optimal reward problem do not leverage experience gained online during the agent’s lifetime nor do they take advantage of knowledge about the agent’s structure. In this work, we develop a gradient ascent approach with formal
Online Pricing for Bandwidth Provisioning in Multiclass Networks
"... We consider the problem of pricing for bandwidth provisioning over a single link, where users arrive according to a known stochastic tra#c model. The network administrator controls the resource allocation by setting a price at every epoch, and each user's response to the price is governed by a ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
demand function. We formulate this problem as a partially observable Markov decision process (POMDP), and explore two novel pricing schemesreactive pricing and spot pricingand compare their performance to appropriately tuned flat pricing. We use a gradientascent approach in all the three pricing
Dynamic Pricing for Bandwidth Provisioning
 in Proceedings of the 36th Annual Conference on Information Sciences and Systems
, 2002
"... We consider the problem of pricing for bandwidth provisioning over a single link, where users arrive according to a known stochastic traffic model. The network administrator controls the resource allocation by setting a price at every epoch, and each user's response to the price is governed by ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
by a demand function. We formulate this problem as a partially observable Markov decision process (POMDP), and explore two novel pricing schemes  reactive pricing and spot pricing  and compare their performance to appropriately tuned at pricing. We use a gradientascent approach in all the three
Multiobjective Optimization of CoClustering Ensembles
"... Coclustering is a machine learning task where the goal is to simultaneously develop clusters of the data and of their respective features. We address the use of coclustering ensembles to establish a consensus coclustering over the data. In this paper we develop a new preferencebased multiobjecti ..."
Abstract
 Add to MetaCart
based multiobjective optimization algorithm to compete with a previous gradient ascent approach in finding optimal coclustering ensembles. Unlike the gradient ascent algorithm, our approach once tackles the coclustering problem with multiple heuristics, then applies the gradient ascent algorithm’s joint heuristic
Multiagent learning with policy prediction
 In AAAI
, 2010
"... Due to the nonstationary environment, learning in multiagent systems is a challenging problem. This paper first introduces a new gradientbased learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Due to the nonstationary environment, learning in multiagent systems is a challenging problem. This paper first introduces a new gradientbased learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion
Contour
"... Example: identify all objects person car frontal kaizer Intuitive, easy to get ground truth data A pastoral example: identify all objects Approaches to bounding box localization person bottle sheep l. Target/quality function, score, grade: all refer to a function f: window ��� � real f is an order r ..."
Abstract
 Add to MetaCart
relation, reflects the likelihood of the object to be found in the given window sheep right sheep l. Gradient ascent approach flaw: Finds local maxima Sliding window algorithms flaw: slow, O(n4) windows to check 2 Gradient ascent example Sliding Window: example f(w) = +0.1 f(w) =0.2 f(w) =0.1 3 f
Reinforcement Learning in POMDP's via Direct Gradient Ascent
 In Proc. 17th International Conf. on Machine Learning
, 2000
"... This paper discusses theoretical and experimental aspects of gradientbased approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCElike algorithm for estimating an approximation to the gradient of the average reward as a function of ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
This paper discusses theoretical and experimental aspects of gradientbased approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCElike algorithm for estimating an approximation to the gradient of the average reward as a function
Direct GradientBased Reinforcement Learning:II. Gradient Ascent Algorithms and Experiments
, 1999
"... Abstract In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).The algorithm's chief advantages are that it requires only a single sample path of the under ..."
Abstract
 Add to MetaCart
present CONJPOMDP, a conjugategradient ascent algorithm that uses GPOMDP as a subroutine to estimate the gradient direction. CONJPOMDP uses a novel linesearch routine that relies solely on gradient estimates and hence is robust to noise in the performance estimates. OLPOMDP,an online gradient ascent
unknown title
"... Abstract We consider the problem of pricing for bandwidth provisioning over a single link, where users arrive according to a known stochastic traffic model. The network administrator controls the resource allocation by setting a price at every epoch, and each user's response to the price is gov ..."
Abstract
 Add to MetaCart
is governed by a demand function. We formulate this problem as a partially observable Markov decision process (POMDP), and explore two novel pricing schemes reactive pricing and spot pricingand compare their performance to appropriately tuned flat pricing. We use a gradientascent approach in all
Results 1  10
of
122