Results 1 
6 of
6
No Free Lunch Theorems for Optimization
, 1997
"... A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performan ..."
Abstract

Cited by 640 (9 self)
 Add to MetaCart
A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class. These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem. Applications of the NFL theorems to informationtheoretic aspects of optimization and benchmark measures of performance are also presented. Other issues addressed include timevarying optimization problems and a priori “headtohead” minimax distinctions between optimization algorithms, distinctions that result despite the NFL theorems’ enforcing of a type of uniformity over all algorithms.
The supervised learning nofreelunch Theorems
 In Proc. 6th Online World Conference on Soft Computing in Industrial Applications
, 2001
"... Abstract This paper reviews the supervised learning versions of the nofreelunch theorems in a simplified form. It also discusses the significance of those theorems, and their relation to other aspects of supervised learning. ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Abstract This paper reviews the supervised learning versions of the nofreelunch theorems in a simplified form. It also discusses the significance of those theorems, and their relation to other aspects of supervised learning.
Combining Stacking With Bagging To Improve A Learning Algorithm
, 1996
"... In bagging [Bre94a] one uses bootstrap replicates of the training set [Efr79, ET93] to improve a learning algorithm's performance, often by tens of percent. This paper presents several ways that stacking [Wol92b, Bre92] can be used in concert with the bootstrap procedure to achieve a further improve ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In bagging [Bre94a] one uses bootstrap replicates of the training set [Efr79, ET93] to improve a learning algorithm's performance, often by tens of percent. This paper presents several ways that stacking [Wol92b, Bre92] can be used in concert with the bootstrap procedure to achieve a further improvement on the performance of bagging for some regression problems. In particular, in some of the work presented here, one first converts a single underlying learning algorithm into several learning algorithms. This is done by bootstrap resampling the training set, exactly as in bagging. The resultant algorithms are then combined via stacking. This procedure can be viewed as a variant of bagging, where stacking rather than uniform averaging is used to achieve the combining. The stacking improves performance over simple bagging by up to a factor of 2 on the tested problems, and never resulted in worse performance than simple bagging. In other work presented here, there is no step of converting t...
Any Two Learning Algorithms Are (Almost) Exactly Identical
, 2000
"... This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarityofany two supervised learning algorithms, even nonparametric algorithms. Intuitively, this measure gives the fraction of targets and training se ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarityofany two supervised learning algorithms, even nonparametric algorithms. Intuitively, this measure gives the fraction of targets and training sets for which the expected performance of the two algorithms differs significantly. Bounds on the value of this distance are calculated for the case of binary outputs and 01 loss, indicating that anytwo learning algorithms are almost exactly identical for such scenarios. As an example, for any two algorithms B,even for small input spaces and training sets, for less than 2e of all targets will the difference between A's and B's generalization performance exceed 1%. In particular, this is true if B is bagging applied to A, or boosting applied to A. These bounds can be viewed alternatively as telling us, for example, that the simple English phrase "I expect that algorithm will generalize from the training set with an accuracy of at least 75% on the rest of the target" conveys 20,000 bytes of information concerning the target. The paper ends by discussing some of the subtleties of extending the distance measure to give a full (nonparametric) differential geometry of the manifold of learning algorithms.