• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 452
Next 10 →

Learning greedy policies for the easy-first framework

by Jun Xie, Chao Ma, Prashanth Mannem, Xiaoli Fern, Tom Dietterich, Prasad Tadepalli - In Proceedings of AAAI Conference on Artificial Intelligence (AAAI , 2015
"... Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions and ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions

Learning Greedy Policies for the Easy-First Framework

by unknown authors
"... Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions and ..."
Abstract - Add to MetaCart
Easy-first, a search-based structured prediction ap-proach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scor-ing function) to make easy decisions first, which con-strains the remaining decisions

A Structured Multiarmed Bandit Problem and the Greedy Policy

by Adam J. Mersereau, Paat Rusmevichientong, John N. Tsitsiklis
"... We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greed ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
greedy policy that takes advantage of the known statistical correlation structure among the arms. In the infinite horizon discounted reward setting, we show that the greedy and optimal policies eventually coincide, and both settle on the best arm. This is in contrast with the Incomplete Learning Theorem

Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions

by Ronald Williams, Leemon C. Baird , 1993
"... Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal value function, at some states there will be a discrepancy, which is natural to call the Bellman resid ..."
Abstract - Cited by 104 (1 self) - Add to MetaCart
greedy policy based on the given value function will be as a function of the maximum norm magnitude of this Bellman residual. A corresponding result is also obtained for value functions defined on state-action pairs, as are used in Q-learning. One significant application of these results is to problems

Learning Greedy Policies for the Easy-First Framework

by Xiaoli Z. Fern , 2014
"... Abstract approved: ..."
Abstract - Add to MetaCart
Abstract approved:

The two headed disk: Stochastic dominance of the greedy policy

by S. Seshadri A, D. Rotem B , 1995
"... Information pgyg ..."
Abstract - Add to MetaCart
Information pgyg

Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy

by Ye Lu, Asuman Ozdaglar, David Simchi-levi , 2010
"... Abstract. We consider the problem of stock repurchase over a finite time horizon. We assume that a firm has a reservation price for the stock, which is the highest price that the firm is willing to pay to repurchase its own stock. We characterize the optimal policy for the trader to maximize the tot ..."
Abstract - Add to MetaCart
the total number of shares he can buy over a fixed time horizon. In particular, we study a greedy policy, which involves in each period buying a quantity that drives stock price to the reservation price. Key words: stock repurchase, dynamic programming, reservation price. 1

Enhancing Greedy Policy Techniques for Complex Cost-Sensitive Problems

by Camelia Vidrighin Bratu , Rodica Potolea
"... ..."
Abstract - Add to MetaCart
Abstract not found

Processing Letters The two headed disk: Stochastic dominance of the greedy policy

by Sridhar Seshadri, Leonard N. Stern, Doron Rotem, S. Seshadri, D. Rotern , 1995
"... ..."
Abstract - Add to MetaCart
Abstract not found

Policy gradient methods for reinforcement learning with function approximation.

by Richard S Sutton , David Mcallester , Satinder Singh , Yishay Mansour - In NIPS, , 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract - Cited by 439 (20 self) - Add to MetaCart
into estimating a value function, with the action-selection policy represented implicitly as the "greedy" policy with respect to the estimated values (e.g., as the policy that selects in each state the action with highest estimated value). The value-function approach has worked well in many applications
Next 10 →
Results 1 - 10 of 452
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University