Results 1  10
of
3,648
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract

Cited by 5500 (120 self)
 Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve longterm goals.
No Free Lunch Theorems for Optimization
, 1997
"... A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performan ..."
Abstract

Cited by 928 (10 self)
 Add to MetaCart
A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class. These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem. Applications of the NFL theorems to informationtheoretic aspects of optimization and benchmark measures of performance are also presented. Other issues addressed include timevarying optimization problems and a priori “headtohead” minimax distinctions between optimization algorithms, distinctions that result despite the NFL theorems’ enforcing of a type of uniformity over all algorithms.
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
(Show Context)
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Keywords: bandit problems, adaptive allocation rules, finite horizon regret 1.
Community detection in graphs
, 2009
"... The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of th ..."
Abstract

Cited by 801 (1 self)
 Add to MetaCart
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such
TABU SEARCH
"... Tabu Search is a metaheuristic that guides a local heuristic search procedure to explore the solution space beyond local optimality. One of the main components of tabu search is its use of adaptive memory, which creates a more flexible search behavior. Memory based strategies are therefore the hallm ..."
Abstract

Cited by 790 (44 self)
 Add to MetaCart
Tabu Search is a metaheuristic that guides a local heuristic search procedure to explore the solution space beyond local optimality. One of the main components of tabu search is its use of adaptive memory, which creates a more flexible search behavior. Memory based strategies are therefore the hallmark of tabu search approaches, founded on a quest for "integrating principles, " by which alternative forms of memory are appropriately combined with effective strategies for exploiting them. In this chapter we address the problem of training multilayer feedforward neural networks. These networks have been widely used for both prediction and classification in many different areas. Although the most popular method for training these networks is backpropagation, other optimization methods such as tabu search have been applied to solve this problem. This chapter describes two training algorithms based on the tabu search. The experimentation shows that the procedures provide high quality solutions to the training problem, and in addition consume a reasonable computational effort.
Stacked generalization
 Neural Networks
, 1992
"... Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a s ..."
Abstract

Cited by 714 (8 self)
 Add to MetaCart
(Show Context)
Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of crossvalidation, exploiting a strategy more sophisticated than crossvalidation’s crude winnertakesall for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surfacefitter. With the other experimental evidence in the literature, the usual arguments supporting crossvalidation, and the abstract justifications presented in this paper, the conclusion is that for almost any realworld generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set preprocessing, crossvalidation, error estimation and correction.
Evolutionary Computing
, 2005
"... Evolutionary computing (EC) is an exciting development in Computer Science. It amounts to building, applying and studying algorithms based on the Darwinian principles of natural selection. In this paper we briefly introduce the main concepts behind evolutionary computing. We present the main compone ..."
Abstract

Cited by 610 (35 self)
 Add to MetaCart
Evolutionary computing (EC) is an exciting development in Computer Science. It amounts to building, applying and studying algorithms based on the Darwinian principles of natural selection. In this paper we briefly introduce the main concepts behind evolutionary computing. We present the main components all evolutionary algorithms (EA), sketch the differences between different types of EAs and survey application areas ranging from optimization, modeling and simulation to entertainment.
Ant algorithms for discrete optimization
 ARTIFICIAL LIFE
, 1999
"... This article presents an overview of recent work on ant algorithms, that is, algorithms for discrete optimization that took inspiration from the observation of ant colonies’ foraging behavior, and introduces the ant colony optimization (ACO) metaheuristic. In the first part of the article the basic ..."
Abstract

Cited by 475 (42 self)
 Add to MetaCart
(Show Context)
This article presents an overview of recent work on ant algorithms, that is, algorithms for discrete optimization that took inspiration from the observation of ant colonies’ foraging behavior, and introduces the ant colony optimization (ACO) metaheuristic. In the first part of the article the basic biological findings on real ants are reviewed and their artificial counterparts as well as the ACO metaheuristic are defined. In the second part of the article a number of applications of ACO algorithms to combinatorial optimization and routing in communications networks are described. We conclude with a discussion of related work and of some of the most important aspects of the ACO metaheuristic.