Results 1  10
of
37
Technical update: Leastsquares temporal difference learning
 Machine Learning
, 2002
"... Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a ste ..."
Abstract

Cited by 128 (2 self)
 Add to MetaCart
(Show Context)
Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the LeastSquares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto’s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from λ = 0 to arbitrary values of λ; at the extreme of λ = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a modelbased reinforcement learning technique.
LeastSquares Temporal Difference Learning
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule ..."
Abstract

Cited by 118 (0 self)
 Add to MetaCart
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the LeastSquares TD (LSTD) algorithm of Bradtke and Barto (Bradtke and Barto, 1996) eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linear regression. Third, it presents a novel, intuitive interpretation of LSTD as a modelbased reinforcement learning technique.
Learning Evaluation Functions for Global Optimization and Boolean Satisfiability
 In Proc. of 15th National Conf. on Artificial Intelligence (AAAI
, 1998
"... This paper describes STAGE, a learning approach to automatically improving search performance on optimization problems. STAGE learns an evaluation function which predicts the outcome of a local search algorithm, such as hillclimbing or WALKSAT, as a function of state features along its search ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
(Show Context)
This paper describes STAGE, a learning approach to automatically improving search performance on optimization problems. STAGE learns an evaluation function which predicts the outcome of a local search algorithm, such as hillclimbing or WALKSAT, as a function of state features along its search trajectories. The learned evaluation function is used to bias future search trajectories toward better optima. We present positive results on six largescale optimization domains.
Learning Evaluation Functions to Improve Optimization by Local Search
 Journal of Machine Learning Research
, 2000
"... This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited durin ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
(Show Context)
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward better optima on the same problem. Another algorithm, XStage, transfers previously learned evaluation functions to new, similar optimization problems. Empirical results are provided on seven largescale optimization domains: binpacking, channel routing, Bayesian network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.
Credit Card Fraud Detection Using Bayesian and Neural Networks
 In: Maciunas RJ, editor. Interactive imageguided neurosurgery. American Association Neurological Surgeons
, 1993
"... This paper discusses automated credit card fraud detection by means of machine learning. In an era of digitalization, credit card fraud detection is of great importance to financial institutions. We apply two machine learning techniques suited for reasoning under uncertainty: artificial neural netwo ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
This paper discusses automated credit card fraud detection by means of machine learning. In an era of digitalization, credit card fraud detection is of great importance to financial institutions. We apply two machine learning techniques suited for reasoning under uncertainty: artificial neural networks and Bayesian belief networks to the problem and show their significant results on real world financial data. Finally, future directions are indicated to improve both techniques and results.
Reusing Old Policies to Accelerate Learning on New MDPs
, 1999
"... We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a fixed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a te ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a fixed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a termination condition. We use an option called a reuse option, for which the set of initiation states is the set of all states, the policy is a combination of policies from the old MDPs, and the termination condition is based on the number of time steps since the option was initiated. Given policies for m of the MDPs from the distribution, we construct reuse options from the policies and compare performance on an m + 1st MDP both with and without various reuse options. We find that reuse options can speed initial learning of the m+ 1st task. We also present a distribution of MDPs for which reuse options can slow initial learning. We discuss reasons for this and suggest other ways to design reuse options.
Learning InstanceIndependent Value Functions to Enhance Local Search
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 1998
"... Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to lowcost solutions better than can ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to lowcost solutions better than can the original cost function. We describe a reinforcement learning method for enhancing local search that combines aspects of previous work by Zhang and Dietterich (1995) and Boyan and Moore (1997, Boyan 1998). In an offline learning phase, a value function is learned that is useful for guiding search for multiple problem sizes and instances. We illustrate our technique by developing several such functions for the DialARide Problem. Our learningenhanced local search algorithm exhibits an improvement of more then 30% over a standard local search algorithm.
Enhancing stochastic search performance by valuebiased randomization of heuristics
 Journal of Heuristics
, 2005
"... Abstract. Stochastic search algorithms are often robust, scalable problem solvers. In this paper, we concern ourselves with the class of stochastic search algorithms called stochastic sampling. Randomization in such a search framework can be an effective means of expanding search around a stochastic ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Stochastic search algorithms are often robust, scalable problem solvers. In this paper, we concern ourselves with the class of stochastic search algorithms called stochastic sampling. Randomization in such a search framework can be an effective means of expanding search around a stochastic neighborhood of a strong domain heuristic. Specifically, we show that a valuebiased approach can be more effective than the rankbiased approach of the heuristicbiased stochastic sampling algorithm. We also illustrate the effectiveness of valuebiasing the starting configurations of a local hillclimber. We use the weighted tardiness scheduling problem to evaluate our approach.
Boosting Stochastic Problem Solvers through Online SelfAnalysis of Performance
, 2003
"... In many combinatorial domains, simple stochastic algorithms often exhibit superior performance when compared to highly customized approaches. Many of these simple algorithms outperform more sophisticated approaches on difficult benchmark problems; and often lead to better solutions as the algorithms ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
In many combinatorial domains, simple stochastic algorithms often exhibit superior performance when compared to highly customized approaches. Many of these simple algorithms outperform more sophisticated approaches on difficult benchmark problems; and often lead to better solutions as the algorithms are taken out of the world of benchmarks and into the realworld. Simple stochastic algorithms are often robust, scalable problem solvers.
Guiding Conformation Space Search with an AllAtom Energy Potential Short title: ModelBased Search for Protein Folding
"... Keywords:Protein structure prediction, conformational space search, multiple energy functions, active learning, Rosetta, Monte Carlo. The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landsc ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Keywords:Protein structure prediction, conformational space search, multiple energy functions, active learning, Rosetta, Monte Carlo. The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find nearoptimal minima. To alleviate this problem, we present modelbased search, a novel conformation space search method. Modelbased search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Modelbased search aggregates information in the model as it progresses, and in turn uses this information to guide exploration towards regions most likely to contain a nearoptimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that modelbased search is more effective at finding lowenergy conformations in highdimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy. 1