## Wrapped progressive sampling search for optimizing learning algorithm parameters (2004)

Venue: | Proceedings of the Sixteenth Belgian-Dutch Conference on Artificial Intelligence |

Citations: | 17 - 8 self |

### BibTeX

@INPROCEEDINGS{Bosch04wrappedprogressive,

author = {Antal Van Den Bosch},

title = {Wrapped progressive sampling search for optimizing learning algorithm parameters},

booktitle = {Proceedings of the Sixteenth Belgian-Dutch Conference on Artificial Intelligence},

year = {2004},

pages = {219--226}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a heuristic meta-learning search method for finding a set of optimized algorithmic parameters for a range of machine learning algorithms. The method, wrapped progressive sampling, is a combination of classifier wrapping and progressive sampling of training data. A series of experiments on UCI benchmark data sets with nominal features, and five machine learning algorithms to which simple wrapping and wrapped progressive sampling is applied, yields results that show little improvement for the algorithm which offers few parameter variations, but marked improvements for the algorithms offering many possible testable parameter combinations, yielding up to 32.2 % error reduction with the winnow learning algorithm. 1

### Citations

4955 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...learning algorithms. (1) Ripper [4] is a rule-induction algorithm that compresses a data set of labeled examples into an ordered rule list. We employed implementation V1 Release 2.5 patch 1. (2) C4.5 =-=[11]-=- is an algorithm for the top-down induction of decision trees. It compresses a data set of labeled examples into a decision tree that has class labels at its nodes, and tests on feature values on its ... |

2870 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...six UCI repository data sets. On the top five tasks normal wrapping is performed rather than wps. wps. 4 Experimental setup For our experiments we used ten benchmark data sets from the UCI repository =-=[2]-=-. Table 2 lists some data set statistics for the ten selected data sets, which all have nominal attributes. Note that “soybean” is short for “soybean-large”, and “car” is short for “car evaluation”. A... |

1056 | Instance-based learning algorithm
- Aha, Kibler, et al.
- 1991
(Show Context)
Citation Context ...ic learner in which the central probability matrix between feature values and class labels is smoothed towards a state of maximum entropy. We used the implementation by [8], version 20040315. (4) Ib1 =-=[1]-=- is an instance-based classifier based on the k-nearest neighbor (k-NN) classification rule. We used an implementation that supports a range of k-NN kernel plugins, TiMBL [5], version 5.0.0 patch 3. (... |

1033 | Wrappers for feature subset selection
- Kohavi, John
- 1997
(Show Context)
Citation Context ...ew data. One can estimate it on the labeled data available for training purposes, but optimizing parameters on that easily leads to overfitting. A remedy for overfitting is to use classifier wrapping =-=[7]-=-, which partitions the available labeled training material in internal training and test data, and which performs cross-validation experiments to estimate a trainingset-internal generalization accurac... |

979 | Fast effective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...velopers) and about the algorithms’ parameter value constraints (which may only be known as rules of thumb). We customized wps to the following five well-known machine-learning algorithms. (1) Ripper =-=[4]-=- is a rule-induction algorithm that compresses a data set of labeled examples into an ordered rule list. We employed implementation V1 Release 2.5 patch 1. (2) C4.5 [11] is an algorithm for the top-do... |

674 | Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ...tance-based classifier based on the k-nearest neighbor (k-NN) classification rule. We used an implementation that supports a range of k-NN kernel plugins, TiMBL [5], version 5.0.0 patch 3. (5) Winnow =-=[9]-=- is a linear-threshold classifier which learns weights on the model parameters (features) in a learning phase through an error-based weight update rule, which at the side removes weights that end up b... |

92 | Efficient progressive sampling
- Provost, Jensen, et al.
(Show Context)
Citation Context ...ing, but it does not venture intosearching among all possible combinations of parameter settings exhaustively on all training data available. Rather, it borrows a heuristic from progressive sampling =-=[10]-=-. The goal of progressive sampling is to iteratively seek a data set size at which generalization performance on test material converges. The method is to start at a small sample size training set, an... |

54 |
den Bosch. “TiMBL: Tilburg Memory Based Learner, version 1.0, Reference Guide”. ILK Technical Report 98-03, available from: http://ilk.kub.nl
- Daelemans, Zavrel, et al.
- 1998
(Show Context)
Citation Context ...version 20040315. (4) Ib1 [1] is an instance-based classifier based on the k-nearest neighbor (k-NN) classification rule. We used an implementation that supports a range of k-NN kernel plugins, TiMBL =-=[5]-=-, version 5.0.0 patch 3. (5) Winnow [9] is a linear-threshold classifier which learns weights on the model parameters (features) in a learning phase through an error-based weight update rule, which at... |

39 | Maximum entropy modeling toolkit for python and c
- Le
- 2004
(Show Context)
Citation Context ...(3) Maxent [6] is a probabilistic learner in which the central probability matrix between feature values and class labels is smoothed towards a state of maximum entropy. We used the implementation by =-=[8]-=-, version 20040315. (4) Ib1 [1] is an instance-based classifier based on the k-nearest neighbor (k-NN) classification rule. We used an implementation that supports a range of k-NN kernel plugins, TiMB... |

27 |
The principle of maximum entropy
- Guiasu, Shenitzer
- 1985
(Show Context)
Citation Context ...es. It compresses a data set of labeled examples into a decision tree that has class labels at its nodes, and tests on feature values on its branches. We employed implementation Release 8. (3) Maxent =-=[6]-=- is a probabilistic learner in which the central probability matrix between feature values and class labels is smoothed towards a state of maximum entropy. We used the implementation by [8], version 2... |

21 | Snow user guide
- Carlson, Cumby, et al.
- 1999
(Show Context)
Citation Context ...hich learns weights on the model parameters (features) in a learning phase through an error-based weight update rule, which at the side removes weights that end up below a threshold. We employed SNoW =-=[3]-=-, a sparse implementation of Winnow classifiers, version 3.1.3. Table 1 lists the parameters with their values that were varied, and the total number of combinations of parameter settings tested in th... |