Results 1  10
of
23
LeastSquares Temporal Difference Learning
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule ..."
Abstract

Cited by 95 (0 self)
 Add to MetaCart
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the LeastSquares TD (LSTD) algorithm of Bradtke and Barto (Bradtke and Barto, 1996) eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linear regression. Third, it presents a novel, intuitive interpretation of LSTD as a modelbased reinforcement learning technique.
Technical update: Leastsquares temporal difference learning
 Machine Learning
, 2002
"... Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a ste ..."
Abstract

Cited by 88 (2 self)
 Add to MetaCart
Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the LeastSquares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto’s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from λ = 0 to arbitrary values of λ; at the extreme of λ = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a modelbased reinforcement learning technique.
Learning Evaluation Functions for Global Optimization and Boolean Satisfiability
 In Proc. of 15th National Conf. on Artificial Intelligence (AAAI
, 1998
"... This paper describes STAGE, a learning approach to automatically improving search performance on optimization problems. STAGE learns an evaluation function which predicts the outcome of a local search algorithm, such as hillclimbing or WALKSAT, as a function of state features along its search ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
This paper describes STAGE, a learning approach to automatically improving search performance on optimization problems. STAGE learns an evaluation function which predicts the outcome of a local search algorithm, such as hillclimbing or WALKSAT, as a function of state features along its search trajectories. The learned evaluation function is used to bias future search trajectories toward better optima. We present positive results on six largescale optimization domains.
Learning Evaluation Functions to Improve Optimization by Local Search
 Journal of Machine Learning Research
, 2000
"... This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited durin ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward better optima on the same problem. Another algorithm, XStage, transfers previously learned evaluation functions to new, similar optimization problems. Empirical results are provided on seven largescale optimization domains: binpacking, channel routing, Bayesian network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.
Reusing Old Policies to Accelerate Learning on New MDPs
, 1999
"... We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a fixed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a te ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a fixed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a termination condition. We use an option called a reuse option, for which the set of initiation states is the set of all states, the policy is a combination of policies from the old MDPs, and the termination condition is based on the number of time steps since the option was initiated. Given policies for m of the MDPs from the distribution, we construct reuse options from the policies and compare performance on an m + 1st MDP both with and without various reuse options. We find that reuse options can speed initial learning of the m+ 1st task. We also present a distribution of MDPs for which reuse options can slow initial learning. We discuss reasons for this and suggest other ways to design reuse options.
Learning InstanceIndependent Value Functions to Enhance Local Search
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 1998
"... Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to lowcost solutions better than can ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to lowcost solutions better than can the original cost function. We describe a reinforcement learning method for enhancing local search that combines aspects of previous work by Zhang and Dietterich (1995) and Boyan and Moore (1997, Boyan 1998). In an offline learning phase, a value function is learned that is useful for guiding search for multiple problem sizes and instances. We illustrate our technique by developing several such functions for the DialARide Problem. Our learningenhanced local search algorithm exhibits an improvement of more then 30% over a standard local search algorithm.
Credit Card Fraud Detection Using Bayesian and Neural Networks
 In: Maciunas RJ, editor. Interactive imageguided neurosurgery. American Association Neurological Surgeons
, 1993
"... This paper discusses automated credit card fraud detection by means of machine learning. In an era of digitalization, credit card fraud detection is of great importance to financial institutions. We apply two machine learning techniques suited for reasoning under uncertainty: artificial neural netwo ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
This paper discusses automated credit card fraud detection by means of machine learning. In an era of digitalization, credit card fraud detection is of great importance to financial institutions. We apply two machine learning techniques suited for reasoning under uncertainty: artificial neural networks and Bayesian belief networks to the problem and show their significant results on real world financial data. Finally, future directions are indicated to improve both techniques and results.
Enhancing stochastic search performance by valuebiased randomization of heuristics
 Journal of Heuristics
, 2005
"... Abstract. Stochastic search algorithms are often robust, scalable problem solvers. In this paper, we concern ourselves with the class of stochastic search algorithms called stochastic sampling. Randomization in such a search framework can be an effective means of expanding search around a stochastic ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Abstract. Stochastic search algorithms are often robust, scalable problem solvers. In this paper, we concern ourselves with the class of stochastic search algorithms called stochastic sampling. Randomization in such a search framework can be an effective means of expanding search around a stochastic neighborhood of a strong domain heuristic. Specifically, we show that a valuebiased approach can be more effective than the rankbiased approach of the heuristicbiased stochastic sampling algorithm. We also illustrate the effectiveness of valuebiasing the starting configurations of a local hillclimber. We use the weighted tardiness scheduling problem to evaluate our approach.
Boosting Stochastic Problem Solvers through Online SelfAnalysis of Performance
, 2003
"... In many combinatorial domains, simple stochastic algorithms often exhibit superior performance when compared to highly customized approaches. Many of these simple algorithms outperform more sophisticated approaches on difficult benchmark problems; and often lead to better solutions as the algorithms ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
In many combinatorial domains, simple stochastic algorithms often exhibit superior performance when compared to highly customized approaches. Many of these simple algorithms outperform more sophisticated approaches on difficult benchmark problems; and often lead to better solutions as the algorithms are taken out of the world of benchmarks and into the realworld. Simple stochastic algorithms are often robust, scalable problem solvers.
Using Training Regimens to Teach Expanding Function Approximators
"... In complex realworld environments, traditional (tabular) techniques for solving Reinforcement Learning (RL) do not scale. Function approximation is needed, but unfortunately, existing approaches generally have poor convergence and optimality guarantees. Additionally, for the case of human environme ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In complex realworld environments, traditional (tabular) techniques for solving Reinforcement Learning (RL) do not scale. Function approximation is needed, but unfortunately, existing approaches generally have poor convergence and optimality guarantees. Additionally, for the case of human environments, it is valuable to be able to leverage human input. In this paper we introduce Expanding Value Function Approximation (EVFA), a function approximation algorithm that returns the optimal value function given sufficient rounds. To leverage human input, we introduce a new humanagent interaction scheme, training regimens, which allow humans to interact with and improve agent learning in the setting of a machine learning game. In experiments, we show EVFA compares favorably to standard value approximation approaches. We also show that training regimens enable humans to further improve EVFA performance. In our user study, we find that nonexperts are able to provide effective regimens and that they found the game fun. Categories and Subject Descriptors