Results 1  10
of
45
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 165 (17 self)
 Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
A Machine Learning Architecture for Optimizing Web Search Engines
 In AAAI Workshop on Internetbased Information Systems
, 1996
"... Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing plain text documents, but also include heuristics for adjusting their document rankings based on th ..."
Abstract

Cited by 69 (10 self)
 Add to MetaCart
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing plain text documents, but also include heuristics for adjusting their document rankings based on the special HTML structure of Web documents. In this paper, we describe a wide range of such heuristicsincluding a novel one inspired by reinforcement learning techniques for propagating rewards through a graphwhich can be used to affect a search engine's rankings. We then demonstrate a system which learns to combine these heuristics automatically, based on feedback collected unintrusively from users, resulting in much improved rankings. 1 Introduction Lycos (Mauldin & Leavitt 1994), Alta Vista, and similar Web search engines have become essential as tools for locating information on the evergrowing World Wide Web. Underlying these systems are statistical methods for indexing plain te...
Learning Evaluation Functions to Improve Optimization by Local Search
 Journal of Machine Learning Research
, 2000
"... This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited durin ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward better optima on the same problem. Another algorithm, XStage, transfers previously learned evaluation functions to new, similar optimization problems. Empirical results are provided on seven largescale optimization domains: binpacking, channel routing, Bayesian network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.
Active policy learning for robot planning and exploration under uncertainty
 IN PROCEEDINGS OF ROBOTICS: SCIENCE AND SYSTEMS
, 2007
"... This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is a function of the belief state (filtering distribution). This filtering distribution is in turn nonlinear and subject to discontinuities, which arise because constraints in the robot motion and control models. As a result, the expected cost is nondifferentiable and very expensive to simulate. The new algorithm overcomes the first difficulty and reduces the number of required simulations as follows. First, it assumes that we have carried out previous simulations which returned values of the expected cost for different corresponding policy parameters. Second, it fits a Gaussian process (GP) regression model to these values, so as to approximate the expected cost as a function of the policy parameters. Third, it uses the GP predicted mean and variance to construct a statistical measure that determines which policy parameters should be used in the next simulation. The process is then repeated using the new parameters and the newly gathered expected cost observation. Since the objective is to find the policy parameters that minimize the expected cost, this iterative active learning approach effectively tradesoff between exploration (in regions where the GP variance is large) and exploitation (where the GP mean is low). In our experiments, a robot uses the proposed algorithm to plan an optimal path for accomplishing a series of tasks, while maximizing the information about its pose and map estimates. These estimates are obtained with a standard filter for simultaneous localization and mapping. Upon gathering new observations, the robot updates the state estimates and is able to replan a new path in the spirit of openloop feedback control.
Q2: Memorybased active learning for optimizing noisy continuous functions
, 1998
"... This paper introduces a new algorithm, Q2, for optimizing the expected output of a multiinput noisy continuous function. Q2 is designed to need only a few experiments, it avoids strong assumptions on the form of the function, and it is autonomous in that it requires little problemspecific tweaking. ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
This paper introduces a new algorithm, Q2, for optimizing the expected output of a multiinput noisy continuous function. Q2 is designed to need only a few experiments, it avoids strong assumptions on the form of the function, and it is autonomous in that it requires little problemspecific tweaking. These capabilities are directly applicable to industrial processes, and may become increasingly valuable elsewhere as the machine learning field expands beyond prediction and function identification, and into embedded active learning subsystems in robots, vehicles and consumer products. Four existing approaches to this problem (response surface methods, numerical optimization, supervised learning, and evolutionary methods) all have inadequacies when the requirement of "black box" behavior is combined with the need for few experiments. Q2 uses instancebased determination of a convex region of interest for performing experiments. In conventional instancebased approaches to learning, a neigh...
A Nonparametric Approach to Noisy and Costly Optimization
, 2000
"... This paper describes PAIRWISE BISECTION: a nonparametric approach to optimizing a noisy function with few function evaluations. ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
This paper describes PAIRWISE BISECTION: a nonparametric approach to optimizing a noisy function with few function evaluations.
Reactive search: machine learning for memorybased heuristics
 Teofilo F. Gonzalez (Ed.), Approximation Algorithms and Metaheuristics, Taylor & Francis Books (CRC Press
, 2005
"... 1 Introduction: the role of the user in heuristics Most stateoftheart heuristics are characterized by a certain number of choices and free parameters, whose appropriate setting is a subject that raises issues of research methodology [5, 41, 51]. In some cases, these parameters are tuned through a ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
1 Introduction: the role of the user in heuristics Most stateoftheart heuristics are characterized by a certain number of choices and free parameters, whose appropriate setting is a subject that raises issues of research methodology [5, 41, 51]. In some cases, these parameters are tuned through a feedback loop that includes the user as a crucial learning component: depending on preliminary algorithm tests some parameter values are changed by the
Active Learning For Identifying Function Threshold Boundaries
"... We present an efficient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassification rates, variance, and their combi ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We present an efficient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassification rates, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1  # confidence intervals for seven cosmological parameters. Experimentation shows that the algorithm reduces the computation necessary for the parameter estimation problem by an order of magnitude.
A memorybased rash optimizer
 IN AAAI06 WORKSHOP ON HEURISTIC SEARCH, MEMORY BASED HEURISTICS AND THEIR APPLICATIONS
, 2006
"... This paper presents a memorybased Reactive Affine Shaker (MRASH) algorithm for global optimization. The Reactive Affine Shaker is an adaptive search algorithm based only on the function values. MRASH is an extension of RASH in which good starting points to RASH are suggested online by using Bayes ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
This paper presents a memorybased Reactive Affine Shaker (MRASH) algorithm for global optimization. The Reactive Affine Shaker is an adaptive search algorithm based only on the function values. MRASH is an extension of RASH in which good starting points to RASH are suggested online by using Bayesian Locally Weighted Regression (BLWR). Both techniques use the memory about the previous history of the search to guide the future exploration but in very different ways. RASH compiles the previous experience into a local search area where sample points are drawn, while locallyweighted regression saves the entire previous history to be mined extensively when an additional sample point is generated. Because of the high computational cost related to the BLWR model, it is applied only to evaluate the potential of an initial point for a local search run. The experimental results, focussed onto the case when the dominant computational cost is the evaluation of the target f function, show that MRASH is indeed capable of leading to good results for a smaller number of function evaluations.
Using Response Surfaces and Expected Improvement to Optimize Snake Robot Gait Parameters
"... Abstract — Several categories of optimization problems suffer from expensive objective function evaluation, driving the need for smart selection of subsequent experiments. One such category of problems involves physical robotic systems, which often require significant time, effort, and monetary expe ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract — Several categories of optimization problems suffer from expensive objective function evaluation, driving the need for smart selection of subsequent experiments. One such category of problems involves physical robotic systems, which often require significant time, effort, and monetary expenditure in order to run tests. To assist in the selection of the next experiment, there has been a focus on the idea of response surfaces in recent years. These surfaces interpolate the existing data and provide a measure of confidence in their error, serving as a lowfidelity surrogate function that can be used to more intelligently choose the next experiment. In this paper, we robustly implement a previous algorithm based on the response surface methodology with an expected improvement criteria. We apply this technique to optimize openloop gait parameters for snake robots, and demonstrate improved locomotive capabilities. I.