Results 1 - 10
of
452
Learning greedy policies for the easy-first framework
- In Proceedings of AAAI Conference on Artificial Intelligence (AAAI
, 2015
"... Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions and ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions
Learning Greedy Policies for the Easy-First Framework
"... Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions and ..."
Abstract
- Add to MetaCart
Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions
A Structured Multiarmed Bandit Problem and the Greedy Policy
"... We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greed ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
greedy policy that takes advantage of the known statistical correlation structure among the arms. In the infinite horizon discounted reward setting, we show that the greedy and optimal policies eventually coincide, and both settle on the best arm. This is in contrast with the Incomplete Learning Theorem
Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
, 1993
"... Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal value function, at some states there will be a discrepancy, which is natural to call the Bellman resid ..."
Abstract
-
Cited by 104 (1 self)
- Add to MetaCart
greedy policy based on the given value function will be as a function of the maximum norm magnitude of this Bellman residual. A corresponding result is also obtained for value functions defined on state-action pairs, as are used in Q-learning. One significant application of these results is to problems
Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy
, 2010
"... Abstract. We consider the problem of stock repurchase over a finite time horizon. We assume that a firm has a reservation price for the stock, which is the highest price that the firm is willing to pay to repurchase its own stock. We characterize the optimal policy for the trader to maximize the tot ..."
Abstract
- Add to MetaCart
the total number of shares he can buy over a fixed time horizon. In particular, we study a greedy policy, which involves in each period buying a quantity that drives stock price to the reservation price. Key words: stock repurchase, dynamic programming, reservation price. 1
Policy gradient methods for reinforcement learning with function approximation.
- In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract
-
Cited by 439 (20 self)
- Add to MetaCart
into estimating a value function, with the action-selection policy represented implicitly as the "greedy" policy with respect to the estimated values (e.g., as the policy that selects in each state the action with highest estimated value). The value-function approach has worked well in many applications
Results 1 - 10
of
452