Results 1 - 10
of
12
A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees
, 2006
"... ..."
Samplerank: Learning preference from atomic gradients
- In NIPS WS on Advances in Ranking
, 2009
"... Large templated factor graphs with complex structure that changes during inference have been shown to provide state-of-the-art experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradi ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Large templated factor graphs with complex structure that changes during inference have been shown to provide state-of-the-art experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradients require expensive inference routines. In this paper we propose an online algorithm that instead learns preferences over hypotheses from the gradients between the atomic steps of inference. Although there are a combinatorial number of ranking constraints over the entire hypothesis space, a connection to the frameworks of sampled convex programs reveals a polynomial bound on the number of rankings that need to be satisfied in practice. We further apply ideas of passive aggressive algorithms to our update rules, enabling us to extend recent work in confidenceweighted classification to structured prediction problems. We compare our algorithm to structured perceptron, contrastive divergence, and persistent contrastive divergence, demonstrating substantial error reductions on two real-world problems (20 % over contrastive divergence).
The Smoothed Approximate Linear Program
, 2009
"... We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection ’ of a well studied linear program for exact dynamic programming. Such ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection ’ of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program—the ‘smoothed approximate linear program’— is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate substantially superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude. 1.
Approximate Dynamic Programming via a Smoothed Linear Program
"... We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection ’ of a well studied linear program for exact dynamic programming. Such ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection ’ of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program—the ‘smoothed approximate linear program’— is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate substantially superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude. 1.
A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
"... A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. We report empirical evidence against such a connection and in favor of an alternative explanation. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results. 1
Apply Ant Colony Optimization to Tetris
"... Tetris is a falling block game where the player’s objective is to arrange a sequence of different shaped tetrominoes smoothly in order to survive. In the intelligence games, agent imitates the real player and chooses the best move based on a linear value function. In this paper, we apply Ant Colony ..."
Abstract
- Add to MetaCart
Tetris is a falling block game where the player’s objective is to arrange a sequence of different shaped tetrominoes smoothly in order to survive. In the intelligence games, agent imitates the real player and chooses the best move based on a linear value function. In this paper, we apply Ant Colony Optimization (ACO) method to learn the weights of the function, trying to search an optimal weight-path in the weight graph. We use dynamic heuristic to prevent premature convergence to local optima. Our experimental result is better than most of traditional reinforcement learning methods.
AC/RL HW 1: Tetris
"... The goal of this project is to design a controller to play a simple game of tetris. This controller needs to decide where to place each piece, aiming to maximize its score—in this case, the number of lines completed. ..."
Abstract
- Add to MetaCart
The goal of this project is to design a controller to play a simple game of tetris. This controller needs to decide where to place each piece, aiming to maximize its score—in this case, the number of lines completed.
Supervised by:
"... Reinforcement Learning methods have been succesfully applied to various optimalization problems. Scaling this up to real world sized problems has however been more of a problem. In this research we apply Reinforcement Learning to the game of Tetris which has a very large state space. We not only try ..."
Abstract
- Add to MetaCart
Reinforcement Learning methods have been succesfully applied to various optimalization problems. Scaling this up to real world sized problems has however been more of a problem. In this research we apply Reinforcement Learning to the game of Tetris which has a very large state space. We not only try to learn policies for Standard Tetris but try to learn parameterized policies for Generalized Tetris, which varies from game to game in field size and the chance of a single block occuring. In comparison to a non-parameterized policy we find that a parameterized policy is able to outperform the non-parameterized policy. The increased complexity of learning such a policy using the Cross-Entropy method however reaches a limit at which the policy is too complex to learn and performance drops below the non-parameterized policy. CONTENTS 1

