## Magical Thinking in Data Mining: Lessons From CoIL Challenge 2000 (2001)

Venue: | In Knowledge Discovery and Data Mining |

Citations: | 28 - 0 self |

### BibTeX

@INPROCEEDINGS{Elkan01magicalthinking,

author = {Charles Elkan},

title = {Magical Thinking in Data Mining: Lessons From CoIL Challenge 2000},

booktitle = {In Knowledge Discovery and Data Mining},

year = {2001},

pages = {426--431}

}

### OpenURL

### Abstract

CoIL challenge 2000 was a supervised learning contest that attracted 43 entries. The authors of 29 entries later wrote explanations of their work. This paper discusses these reports and reaches three main conclusions. First, naive Bayesian classifiers remain competitive in practice: they were used by both the winning entry and the next best entry. Second, identifying feature interactions correctly is important for maximizing predictive accuracy: this was the difference between the winning classifier and all others. Third and most important, too many researchers and practitioners in data mining do not appreciate properly the issue of statistical significance and the danger of overfitting. Given a dataset such as the one for the CoIL contest, it is pointless to apply a very complicated learning algorithm, or to perform a very time-consuming model search. In either case, one is likely to overfit the training data and to fool oneself in estimating predictive accuracy and in discovering useful correlations.

### Citations

2379 |
The Structure of Scientific Revolutions
- Kuhn
- 1996
(Show Context)
Citation Context ... to use to organize our perceptions. Whatever these principles are, if we have learned them, it is because they appeared to be useful in the past. As pointed out in a different context by Thomas Kuhn =-=[6]-=-, practitioners of science do not unlearn their scientific worldview when science cannot explain a certain phenomenon. Instead, they either ignore the phenomenon, or they redouble their efforts to und... |

562 | Approximate statistical tests for comparing supervised classification learning algorithms
- Dietterich
- 1998
(Show Context)
Citation Context ...possible to compare two different learning methods with the same training and test datasets in a way that is more sensitive than the simple binomial calculation above, using McNemar's hypothesis test =-=[2]-=-. Let A and B designate two learning algorithms and let n 10 be the number of test examples classified correctly by A but incorrectly by B. Similarly, let n 01 be the number classified incorrectly by ... |

102 | Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers
- Zadrozny, Elkan
- 2001
(Show Context)
Citation Context ...is sensitive to unbalanced data, and probability estimates at leaves are smoothed, then decision trees can be fully competitive with naive Bayesian classifiers on commercial response prediction tasks =-=[11]-=-. 9. CONCLUSIONS In summary, there are three main lessons to be learned from the CoIL data mining contest. The first two lessons are technical, one positive and one negative. The positive lesson is th... |

99 | Learning and making decisions when costs and probabilities are both unknown
- Zadrozny, Elkan
- 2001
(Show Context)
Citation Context ...making the offer. Therefore, the aim of data mining should be to estimate the probability that a customer would accept an offer, and also the costs and benefits of the customer accepting or declining =-=[10]-=-. Second, a customer should not be offered an insurance policy just because he or she resembles other customers who have the same type of policy. The characteristics that predict who is most likely to... |

98 |
Adaptive Thinking: Rationality in the Real World
- Gigerenzer
- 2000
(Show Context)
Citation Context ...fic thinking. Being primed to see patterns in small datasets is an innate characteristic of humans and perhaps other animals also, and this characteristic is often useful for success in everyday life =-=[5]-=-. Moreover, the starting point of scientific thinking is often a type of magical thinking: scientists commonly posit hypotheses based on a low number of observations. These hypotheses are frequently u... |

68 |
Statistical Inference
- Silvey
- 1975
(Show Context)
Citation Context ...1 yields an improved estimate of the true standard deviation of the parent population that will not systematically be too small. Technically, using 1 n 1 gives the minimum variance unbiased estimator =-=[7]. Man-=-y contest submissions reveal basic misunderstandings about the issue of overfitting. For example, one team wrote that they used " : : : evolutionary search for choosing the predictive features. T... |

66 | Boosting and naive bayesian learning
- Elkan
- 1997
(Show Context)
Citation Context ...eatures were discretized in advance, the CoIL competition could not serve as a test of discretization methods. The predictive accuracy of a naive Bayesian classifier can often be improved by boosting =-=[3]-=-, and by adding new attributes derived from combinations of existing attributes. Both boosting and derived attributes are ways of relaxing the conditional independence assumptions that constitute the ... |

52 | Scalability for clustering algorithms revisited, SIGKDD Explorations 2(1) (2000) 51–57. About the Authors Shi Zhong was an Assistant Professor in Florida Atlantic University. Currently, he works at Yahoo! Data Mining and Research Group in Sunnyvale, Calif
- Farnstrom, Lewis, et al.
- 2005
(Show Context)
Citation Context ...ustering methods are false. The k-means algorithm, for example, can handle datasets with millions of records and hundreds of dimensions, where no two records are identical, in effectively linear time =-=[4]-=-. Of course the k-means algorithm is not a panacea: it assumes that all features are numerical and a Euclidean distance metric, and no universally good method is known for relaxing these assumptions. ... |

20 | CRISP-DM: towards a standard process model for data mining
- Wirth, Hipp
- 2000
(Show Context)
Citation Context ...also methodologies for data mining. It is noteworthy that none of the reports written by CoIL contest participants mention using any part of the CRISP-DM European standard methodology for data mining =-=[9]-=-. 8. CHOICE OF LEARNING METHOD Although it is difficult to say with certainty that one learning method gives more accurate classifiers than another, it is possible to say with certainty that for pract... |

7 |
The logical categories of learning and communication
- Bateson
- 1971
(Show Context)
Citation Context ...itioner of magic does not unlearn his magical view of events when the magic does not work. In fact, the propositions which govern punctuation have the general characteristic of being self-validating&q=-=uot; [1]-=-. In any culture, humans have a certain set of expectations that they use to explain the results of their actions. When something surprising happens, rather than question the expectations, people typi... |

5 |
Challenge 2000: The Insurance Company Case
- Greenyer, “CoIL
- 2000
(Show Context)
Citation Context ...ater wrote reports explaining their methods and results. The authors of these reports appear to be data mining practitioners or researchers, as opposed to students. The reports have been published by =-=[8]-=-. The CoIL contest was quite similar to the competitions organized in conjunction with the KDD conference in recent years, and to other data mining competitions. The contest task was to learn a classi... |