Results 1 - 10
of
41
Least-Squares Temporal Difference Learning
- In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (Bradtke and Barto, 1996) eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linear regression. Third, it presents a novel, intuitive interpretation of LSTD as a model-based reinforcement learning technique.
Technical update: Least-squares temporal difference learning
- Machine Learning
, 2002
"... Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a ste ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto’s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from λ = 0 to arbitrary values of λ; at the extreme of λ = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.
Evolutionary Search of Approximated N-Dimensional Landscapes
- International Journal of Knowledge-based Intelligent Engineering Systems
, 2000
"... Finding the global optimum on a large, multimodal, complex, and discontinuous (or nondifferentiable) landscape is usually very hard, even using the evolutionary approach. However, some of these complex landscapes can be approximated and smoothened without changing the nature of the problem, i.e., wi ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Finding the global optimum on a large, multimodal, complex, and discontinuous (or nondifferentiable) landscape is usually very hard, even using the evolutionary approach. However, some of these complex landscapes can be approximated and smoothened without changing the nature of the problem, i.e., without modifying the global optimum and its location. The approximated and smoothened landscape is often much easier to search than the original one. In this paper, we propose a new algorithm using landscape approximation and hybrid evolutionary and local search. We also list several algorithm design principles. Following the basic algorithm, an example algorithm is given from our previous work of the combination of landscape approximation and local search (LALS). Furthermore, we develop a novel evolutionary algorithm with n-dimensional approximation (EANA), which shares the same rules as the basic algorithm, but remedies some of the drawbacks found in the LALS. Comparisons with evo...
Incomplete Tree Search using Adaptive Probing
, 2001
"... When not enough time is available to fully explore a search tree, different algorithms will visit different leaves. Depth-first search and depth-bounded discrepancy search, for example, make opposite assumptions about the distribution of good leaves. Unfortunately, it is rarely clear a priori which ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
When not enough time is available to fully explore a search tree, different algorithms will visit different leaves. Depth-first search and depth-bounded discrepancy search, for example, make opposite assumptions about the distribution of good leaves. Unfortunately, it is rarely clear a priori which algorithm will be most appropriate for a particular problem. Rather than fixing strong assumptions in advance, we propose an approach in which an algorithm attempts to adjust to the distribution of leaf costs in the tree while exploring it. By sacrificing completeness, such flexible algorithms can exploit information gathered during the search using only weak assumptions. As an example, we show how a simple depth-based additive cost model of the tree can be learned on-line. Empirical analysis using a generic tree search problem shows that adaptive probing is competitive with systematic algorithms on a variety of hard trees and outperforms them when the node-ordering heuristic makes many mistakes. Results on boolean satisfiability and two different representations of number partitioning confirm these observations. Adaptive probing combines the flexibility and robustness of local search with the ability to take advantage of constructive heuristics.
Ten challenges redux: Recent progress in propositional reasoning and search
- In Proceedings of CP ’03
, 2003
"... Abstract. In 1997 we presented ten challenges for research on satisfiability testing [1]. In this paper we review recent progress towards each of these challenges, including our own work on the power of clause learning and randomized restart policies. 1 ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract. In 1997 we presented ten challenges for research on satisfiability testing [1]. In this paper we review recent progress towards each of these challenges, including our own work on the power of clause learning and randomized restart policies. 1
A System for Building Intelligent Agents that Learn to Retrieve and Extract Information
, 2001
"... We present a system for rapidly and easily building instructable and self-adaptive software agents that retrieve and extract information. Our Wisconsin Adaptive Web Assistant (Wawa) constructs intelligent agents by accepting user preferences in the form of instructions. These user-provided instructi ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We present a system for rapidly and easily building instructable and self-adaptive software agents that retrieve and extract information. Our Wisconsin Adaptive Web Assistant (Wawa) constructs intelligent agents by accepting user preferences in the form of instructions. These user-provided instructions are compiled into neural networks that are responsible for the adaptive capabilities of an intelligent agent. The agent's neural networks are modified via user-provided and system-constructed training examples. Users can create training examples by rating Web pages (or documents) , but more importantly Wawa's agents uses techniques from reinforcement learning to internally create their own examples. Users can also provide additional instruction throughout the life of an agent. Our experimental evaluations on a "home-page finder" agent and a "seminar-announcement extractor" agent illustrate the value of using instructable and adaptive agents for retrieving and extracting information.
Reactive search: machine learning for memory-based heuristics
- Teofilo F. Gonzalez (Ed.), Approximation Algorithms and Metaheuristics, Taylor & Francis Books (CRC Press
, 2005
"... 1 Introduction: the role of the user in heuristics Most state-of-the-art heuristics are characterized by a certain number of choices and free parameters, whose appropriate setting is a subject that raises issues of research methodology [5, 41, 51]. In some cases, these parameters are tuned through a ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
1 Introduction: the role of the user in heuristics Most state-of-the-art heuristics are characterized by a certain number of choices and free parameters, whose appropriate setting is a subject that raises issues of research methodology [5, 41, 51]. In some cases, these parameters are tuned through a feedback loop that includes the user as a crucial learning component: depending on preliminary algorithm tests some parameter values are changed by the
Global Optimization For Constrained Nonlinear Programming
, 2001
"... In this thesis, we develop constrained simulated annealing (CSA), a global optimization algorithm that asymptotically converges to constrained global minima (CGM dn ) with probability one, for solving discrete constrained nonlinear programming problems (NLPs). The algorithm is based on the necessary ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In this thesis, we develop constrained simulated annealing (CSA), a global optimization algorithm that asymptotically converges to constrained global minima (CGM dn ) with probability one, for solving discrete constrained nonlinear programming problems (NLPs). The algorithm is based on the necessary and sufficient condition for constrained local minima (CLM dn ) in the theory of discrete constrained optimization using Lagrange multipliers developed in our group. The theory proves the equivalence between the set of discrete saddle points and the set of CLM dn , leading to the first-order necessary and sufficient condition for CLM dn .
Mining Scientific Data
, 2001
"... The past two decades have seen rapid advances in high performance computing and ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
The past two decades have seen rapid advances in high performance computing and
Imitation and Reinforcement Learning in Agents with Heterogeneous Actions
- Seveteenth International Conference on Machine Learning ICML2000
, 2000
"... We study the problem of accelerating reinforcement learning (RL) through the observation and implicit imitation of expert agents (mentors) acting in the same domain. In this paper, we consider problems that arise when the learner and mentor have heterogeneous actions. We extend an earlier impl ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We study the problem of accelerating reinforcement learning (RL) through the observation and implicit imitation of expert agents (mentors) acting in the same domain. In this paper, we consider problems that arise when the learner and mentor have heterogeneous actions. We extend an earlier implicit imitation model to allow for feasibility testing (determining whether a specific mentor action can be duplicated) and repair (discovering a "plan" that simulates a mentor's trajectory) and demonstrate empirically that both of these components allow agents to learn much more readily than standard RL agents and implicit imitation agents without these capabilities. 1. Introduction Cooperative multiagent systems rely on shared models and communication to coordinate their actions in a common environment. While many researchers have examined explicit communication, we have argued (as have others) that implicit communication techniques such as imitation increase the range of applicati...

