Results 11  20
of
356
Bidding under Uncertainty: Theory and Experiments
 In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... This paper describes a study of agent bidding strategies, assuming combinatorial valuations for complementary and substitutable goods, in three auction environments: sequential auctions, simultaneous auctions, and the Trading Agent Competition (TAC) Classic hotel auction design, a hybrid of se ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
by example that marginal utility bidding is not an optimal bidding policy, even in deterministic settings. Two alternative methods of approximating a solution to this stochastic program are presented: the first method, which relies on expected values, is optimal in deterministic environments
Practical Reinforcement Learning in Continuous Spaces
, 2000
"... Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can cause problems for traditional reinforcement learning algorithms which assume discrete states and actions. I ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
. In this paper, we introduce an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data. We give experimental results using this algorithm to learn policies for both a simulated task and also for a real robot, operating
Approximate
"... Abstract — In general, it is difficult to determine an optimal closedloop policy in nonlinear control problems with continuousvalued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionali ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — In general, it is difficult to determine an optimal closedloop policy in nonlinear control problems with continuousvalued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse
Manifold Representations for ValueFunction Approximation
"... Reinforcement learning (RL) has been shown to be an effective paradigm for learning control policies for problems with discrete state spaces. For problems with continuous multidimensional state spaces, the results are ..."
Abstract
 Add to MetaCart
Reinforcement learning (RL) has been shown to be an effective paradigm for learning control policies for problems with discrete state spaces. For problems with continuous multidimensional state spaces, the results are
Manifold Representations for ValueFunction Approximation
"... Reinforcement learning (RL) has been shown to be an effective paradigm for learning control policies for problems with discrete state spaces. For problems with continuous multidimensional state spaces, the results are ..."
Abstract
 Add to MetaCart
Reinforcement learning (RL) has been shown to be an effective paradigm for learning control policies for problems with discrete state spaces. For problems with continuous multidimensional state spaces, the results are
Geometric Asymptotic Approximation of Value Functions ∗
, 2009
"... This paper characterizes the behavior of value functions in dynamic stochastic discounted programming models near fixed points of the state space. When the second derivative of the flow payoff function is bounded, the value function is proportional to a linear function plus x ψδ. A specific formula ..."
Abstract
 Add to MetaCart
for ψδ is provided, which implies ψδ continuously falls in the rate of patience. If the state variable is a martingale, the second derivative of the value function is unbounded. If the state variable is instead a strict local submartingale, then the same holds for the first derivative of the value
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned
Bidding under Uncertainty: Theory and Experiments
 In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... This paper describes a study of agent bidding strategies, assuming combinatorial valuations for complementary and substitutable goods, in three auction environments: sequential auctions, simultaneous auctions, and the Trading Agent Competition (TAC) Classic hotel auction design, a hybrid of se ..."
Abstract
 Add to MetaCart
by example that marginal utility bidding is not an optimal bidding policy, even in deterministic settings. Two alternative methods of approximating a solution to this stochastic program are presented: the first method, which relies on expected values, is optimal in deterministic environments
A.: Reinforcement learning in continuous action spaces through sequential Monte Carlo methods
 In: Adv. Neural Information Proc. Systems
, 2007
"... Learning in realworld domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithms to continuous state problems, the same techniques can be hardly extended to continuous action spaces, where, besides th ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
the computation of a good approximation of the value function, a fast method for the identification of the highestvalued action is needed. In this paper, we propose a novel actorcritic approach in which the policy of the actor is estimated through sequential Monte Carlo methods. The importance sampling step
Results 11  20
of
356