## A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 (2004)

Venue: | Machine Learning |

Citations: | 13 - 0 self |

### BibTeX

@INPROCEEDINGS{Putten04abias-variance,

author = {Peter Van Der Putten and Nada Lavrač and Hiroshi Motoda and Tom Fawcett},

title = {A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000},

booktitle = {Machine Learning},

year = {2004},

pages = {177195},

publisher = {Kluwer Academic Publishers}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.

### Citations

3329 |
Data Mining: Practical machine learning tools and techniques, 2nd edition
- Witten, Frank
- 2005
(Show Context)
Citation Context ...ds and in that case the intrinsic error contributes as a constant factor. The combined bias / intrinsic error effect and the variance error are estimated using the implementation in the WEKA toolkit (=-=Witten & Frank, 2000-=-) following Kohavi and Wolpert (1996). The data are split into two parts. From one part samples are drawn, learning is applied and the prediction error on the other half is calculated. We used 50 samp... |

3056 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...geared towards illustrating the strengths of particular machine learning algorithms rather then representing real world problems. The challenge data is now part of the KDD section of the UCI Archive (=-=Blake & Merz, 1998-=-). The problem was split into a prediction and a description task.sA BIAS-VARIANCE ANALYSIS OF A REAL WORLD LEARNING PROBLEM 179 2.1. Prediction task From a business perspective the goal of the predic... |

1122 | Wrappers for feature subset selection
- Kohavi
- 1995
(Show Context)
Citation Context ...ategory contains methods that select subsets of features rather than just evaluate features individually and independently. There may be several reasons to look at subsets instead of single features (=-=Kohavi & John, 1997-=-). A feature with high individual predictive power but also high correlation to variables that are already selected does not add much information to the model, so it should not be included. A feature ... |

790 | An Introduction to Variable and Feature Selection
- Guyon, Elisseeff
- 2003
(Show Context)
Citation Context ...he choice of learning algorithm. Feature selection reduces variance error for all eight learners. The feature selection methods used by the participants can be divided into three main categories (see =-=Guyon & Elissee, 2003-=- for a recent overview of the state of the art in feature selection). The first category consists of approaches where candidate features are evaluated independently of other features. Simple evaluatio... |

643 |
Neural networks and the bias/variance dilemma
- Geman, Bienenstock, et al.
- 1992
(Show Context)
Citation Context ...icy owners possible. The main question we address here is what caused this wide range of performance. To explain the results we will evaluate the various approaches using bias-variance decomposition (=-=Geman, Bienenstock, & Doursat, 1992-=-; Friedman, 1997; Kohavi & Wolpert, 1996; Breiman, 1996; James, 2003). This separates the error component resulting from the inability of a learner to represent or find the appropriate model for the b... |

637 | On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29:103–130 - Domingos, Pazzani - 1997 |

471 | Very simple classification rules perform well on most commonly used datasets. Machine Learning 11:63–91
- Holte
- 1993
(Show Context)
Citation Context ...e power of a predictor and its correlation to features that are already selected into account. To see the effects of extreme feature selection, we included decision stumps (decision trees of depth 1, =-=Holte, 1993-=-), which were not used by any participant. As can be seen from figure 4, feature selection improves classification results for seven out of eight learners. When all 16 models are compared, six out of ... |

255 | No free lunch theorems for search - Wolpert, Macready - 1995 |

204 | On bias, variance, 0/1 - loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1:55–77
- Friedman
- 1997
(Show Context)
Citation Context ...n we address here is what caused this wide range of performance. To explain the results we will evaluate the various approaches using bias-variance decomposition (Geman, Bienenstock, & Doursat, 1992; =-=Friedman, 1997-=-; Kohavi & Wolpert, 1996; Breiman, 1996; James, 2003). This separates the error component resulting from the inability of a learner to represent or find the appropriate model for the behavior from the... |

185 | Bias plus variance decomposition for zero-one loss functions
- Kohavi, Wolpert
- 1996
(Show Context)
Citation Context ...e is what caused this wide range of performance. To explain the results we will evaluate the various approaches using bias-variance decomposition (Geman, Bienenstock, & Doursat, 1992; Friedman, 1997; =-=Kohavi & Wolpert, 1996-=-; Breiman, 1996; James, 2003). This separates the error component resulting from the inability of a learner to represent or find the appropriate model for the behavior from the error component resulti... |

117 | Correlation-based Feature Subset Selection for Machine Learning
- Hall
- 1998
(Show Context)
Citation Context ...d bias-variance analysis on eight different learners, given the full set and a data set that was reduced to seven variables using the best first version of the CFS feature subset selection algorithm (=-=Hall, 1998-=-; Witten & Frank, 2000). This algorithm takes both the predictive power of a predictor and its correlation to features that are already selected into account. To see the effects of extreme feature sel... |

97 | variance and arcing classifiers
- Breiman
(Show Context)
Citation Context ...de range of performance. To explain the results we will evaluate the various approaches using bias-variance decomposition (Geman, Bienenstock, & Doursat, 1992; Friedman, 1997; Kohavi & Wolpert, 1996; =-=Breiman, 1996-=-; James, 2003). This separates the error component resulting from the inability of a learner to represent or find the appropriate model for the behavior from the error component resulting from varianc... |

88 | Oversearching and layered search in empirical learning - Quinlan, Cameron-Jones - 1995 |

85 | The Role of Occam's Razor in Knowledge Discovery”, Data Mining and Knowledge Discovery, an International Journal
- Domingos
- 1999
(Show Context)
Citation Context ...nal learning bias. Many authors mentioned that they experimented with a number of learning tools, and parameters of tools. This experimentation causes “procedural bias” (Quinlan & CameronJones, 1995; =-=Domingos, 1997-=-): a new method is tried, or a variation of an earlier method and if the accuracy increases then the it is assumed that the new method is better. This may not be true because the new method may have a... |

49 | A unified bias-variance decomposition and its applications, in - Domingos |

43 | Towards principled feature selection: Relevancy, filters and wrappers
- Tsamardinos, Aliferis
- 2003
(Show Context)
Citation Context ...res individually, but for this problem we see no significant difference between subset filter and wrapper methods. This has been confirmed for other domains by a recent study on wrappers and filters (=-=Tsamardinos & Aliferis, 2003-=-). 6. Lessons learned: Model representation and the learning method The selection of a method for constructing the model of the data is generally considered an important decision in the data mining pr... |

30 | Variance and Bias for General Loss Functions
- James
- 2003
(Show Context)
Citation Context ...formance. To explain the results we will evaluate the various approaches using bias-variance decomposition (Geman, Bienenstock, & Doursat, 1992; Friedman, 1997; Kohavi & Wolpert, 1996; Breiman, 1996; =-=James, 2003-=-). This separates the error component resulting from the inability of a learner to represent or find the appropriate model for the behavior from the error component resulting from variance in predicti... |

28 | Magical thinking in data mining: lessons from CoIL challenge 2000
- Elkan
(Show Context)
Citation Context ...t receive the test set targets nor were they informed of the CoIL Challenge or any of the results. In contrast, the second group of students read a paper written by the winner of the prediction task (=-=Elkan, 2001-=-). Both groups compete very well with thes182 P. VAN DER PUTTEN AND M. VAN SOMEREN Figure 1. Histogram of prediction task performance for CoIL Challenge participants and two reference groups of studen... |

16 |
The CRISP-DM process model
- Chapman, Clinton, et al.
- 1999
(Show Context)
Citation Context ...tter understand the factors that determine the success of real world data mining projects. We organize the analysis by the main steps in the data mining process. According to the CRISP process model (=-=Chapman et al., 1999-=-) the top-level knowledge discovery process consists of business understanding, data understanding, data preparation, modeling, evaluation and deployment. Neither the business and data understanding s... |

9 | CoIL Challenge 2000: The Insurance Company Case - Putten, Someren - 2000 |

1 |
Workshop notes on discovery challenge PKDD-99
- Berka
- 1999
(Show Context)
Citation Context ...tudents in this paper. A previous report by van der Putten & van Someren (2000) on an earlier version of this competition covers 6 entries. With the notable exception of the PKDD Discovery Challenge (=-=Berka, 1999-=-), most competitions such as the KDD Cup report only the top entries.s178 P. VAN DER PUTTEN AND M. VAN SOMEREN The objective of the competition was to predict who will be interested in a particular in... |