Results 1 
8 of
8
Learning Evaluation Functions to Improve Optimization by Local Search
 Journal of Machine Learning Research
, 2000
"... This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited durin ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward better optima on the same problem. Another algorithm, XStage, transfers previously learned evaluation functions to new, similar optimization problems. Empirical results are provided on seven largescale optimization domains: binpacking, channel routing, Bayesian network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.
Batch Value Function Approximation via Support Vectors
 Advances in Neural Information Processing Systems 14
, 2001
"... We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily adjust the complexity of the function approximator to fit the complexity of the value function.
Learning and Inference in WEIGHTED LOGIC WITH APPLICATION TO NATURAL LANGUAGE PROCESSING
, 2008
"... ..."
Reinforcement Learning in Distributed Domains: Beyond Team Games
 In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence
, 2001
"... Using a distributed algorithm rather than a centralized one can be extremely beneficial in large search problems. In addition, the incorporation of machine learning techniques like Reinforcement Learning (RL) into search algorithms has often been found to improve their performance. In this article w ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Using a distributed algorithm rather than a centralized one can be extremely beneficial in large search problems. In addition, the incorporation of machine learning techniques like Reinforcement Learning (RL) into search algorithms has often been found to improve their performance. In this article we investigate a search algorithm that combines these properties by employing RL in a distributed manner, essentially using the team game approach. We then present biutility search, which interleaves our distributed algorithm with (centralized) simulated annealing, by using the distributed algorithm to guide the exploration step of the simulated annealing. We investigate using these algorithms in the domain of minimizing the loss of importanceweighted communication data traversing a constellations of communication satellites. To do this we introduce the idea of running these algorithms "on top" of an underlying, learningfree routing algorithm. They do this by having the actions of the distributed learners be the introduction of virtual "ghost" traffic into the decisionmaking of the underlying routing algorithm, traffic that "misleads" the routing algorithm in a way that actually improves performance. We find that using our original distributed RL algorithm to set ghost traffic improves performance, and that biutility search  a semidistributed search algorithm that is widely applicable  substantially outperforms both that distributed RL algorithm and (centralized) simulated annealing in our problem domain.
Stabilizing Value Function Approximation with the BFBP Algorithm
 Advances in Neural Information Processing Systems 14
, 2002
"... We address the problem of nonconvergence of online reinforcement learning algorithms (e.g., Q learning and SARSA()) by adopting an incrementalbatch approach that separates the exploration process from the function tting process. Our BFBP (Batch Fit to Best Paths) algorithm alternates between a ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We address the problem of nonconvergence of online reinforcement learning algorithms (e.g., Q learning and SARSA()) by adopting an incrementalbatch approach that separates the exploration process from the function tting process. Our BFBP (Batch Fit to Best Paths) algorithm alternates between an exploration phase (during which trajectories are generated to try to nd fragments of the optimal policy) and a function tting phase (during which a function approximator is t to the best known paths from start states to terminal states). An advantage of this approach is that batch valuefunction tting is a global process, which allows it to address the tradeos in function approximation that cannot be handled by local, online algorithms. This approach was pioneered by Boyan and Moore with their GrowSupport and ROUT algorithms.
Cost Function Optimization based on Active Learning
"... Optimization is one of the most important issues in all fields of science and engineering. There are two main categories for optimization problems: continues optimization and discrete optimization. Traditional methods, such as gradient descent, are used for solving continues optimization problems, B ..."
Abstract
 Add to MetaCart
Optimization is one of the most important issues in all fields of science and engineering. There are two main categories for optimization problems: continues optimization and discrete optimization. Traditional methods, such as gradient descent, are used for solving continues optimization problems, But for discrete optimization, traditional and many new algorithms are introduced. Due to long time required to solve NPHard problems, special subset of discrete optimization problems, which require nonpolynomial time to find exact (global) optimum, some researchers of artificial intelligence suggest heuristic algorithms for solving these hard problems. They model these problems as a state space search in a search graph, where candidate solutions are nodes, and actions specify links between them. Algorithms based on Reinforcement Learning, Simulated Annealing, and Multi Start Local Search are based on heuristic. Other researchers of artificial intelligence have attacked these problems by algorithms inspired by the nature; Evolutionary Algorithms and Ant Colony are in this category [we can also consider Simulated Annealing as an algorithm which mimics the nature]. It is worth noting that Molecular (DNA) Computing has also been used for solving TSP problem. In this project, we consider the problem of optimizing discrete cost functions, especially those of
Guiding Constructive Search
"... Several real world applications involve solving combinatorial optimization problems. ..."
Abstract
 Add to MetaCart
Several real world applications involve solving combinatorial optimization problems.
Placement and Routing for 3DFPGAs using Reinforcement Learning and Support Vector Machines
"... The primary advantage of using 3DFPGA over 2DFPGA is that the vertical stacking of active layers reduce the Manhattan distance between the components in 3DFPGA than when placed on 2DFPGA. This results in a considerable reduction in total interconnect length. Reduced wire length eventually leads ..."
Abstract
 Add to MetaCart
The primary advantage of using 3DFPGA over 2DFPGA is that the vertical stacking of active layers reduce the Manhattan distance between the components in 3DFPGA than when placed on 2DFPGA. This results in a considerable reduction in total interconnect length. Reduced wire length eventually leads to reduction in delay and hence improved performance and speed. Design of an efficient placement and routing algorithm for 3DFPGA that fully exploits the above mentioned advantage is a problem of deep research and commercial interest. In this paper, an efficient placement and routing algorithm is proposed for 3DFPGAs which yields better results in terms of total interconnect length and channelwidth. The proposed algorithm employs two important techniques, namely, Reinforcement Learning (RL) and Support Vector Machines (SVMs), to perform the placement. The proposed algorithm is implemented and tested on standard benchmark circuits and the results obtained are encouraging. This is one of the very few instances where reinforcement learning is used for solving a problem in the area of VLSI.