## Learning and Making Decisions When Costs and Probabilities are Both Unknown (2001)

### Cached

### Download Links

- [www-cse.ucsd.edu]
- [www-cse.ucsd.edu]
- [www-cse.ucsd.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining |

Citations: | 100 - 9 self |

### BibTeX

@INPROCEEDINGS{Zadrozny01learningand,

author = {Bianca Zadrozny and Charles Elkan},

title = {Learning and Making Decisions When Costs and Probabilities are Both Unknown},

booktitle = {In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining},

year = {2001},

pages = {204--213},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

In many machine learning domains, misclassication costs are dierent for dierent examples, in the same way that class membership probabilities are exampledependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. This paper rst discusses how to make optimal decisions given cost and probability estimates, and then presents decision tree learning methods for obtaining well-calibrated probability estimates. The paper then explains how to obtain unbiased estimators for example-dependent costs, taking into account the diculty that in general, probabilities and costs are not independent random variables, and the training examples for which costs are known are not representative of all examples. The latter problem is called sample selection bias in econometrics. Our solution to it is based on Nobel prize-winning work due to the economist James Heckman. We show that the methods we propose are s...

### Citations

5438 | C4.5: Programs for Machine Learning - Quinlan - 1993 |

4457 | Classification and Regression Trees - Breiman, Friedman, et al. - 1984 |

2765 | Bagging predictors - Breiman - 1996 |

313 | Beyond independence: conditions for the optimality of the simple Bayesian classi
- Domingos, M
- 1996
(Show Context)
Citation Context ...ers are based on the assumption that within each class, the values of the attributes of examples are independent. It is well-known that these classiers tend to give inaccurate probability estimates [9=-=-=-]. Given an example x, suppose that a naive Bayesian classier computes the score n(x). Because attributes tend to be positively correlated, these scores are typically too extreme: for most x, either n... |

286 | The foundations of cost-sensitive learning
- Elkan
- 2001
(Show Context)
Citation Context ...alk consistently about benets than about costs. The reason is that all benets are straightforward cashsows relative to a baseline wealth of $0, while some costs are counterfactual opportunity costs [1=-=1-=-]. Accordingly, our formulation of the problem is in terms of benets instead of costs. This formulation applies very generally, including to all the scenarios mentioned in the Introduction, because be... |

163 |
The UCI KDD archive
- Hettich, Bay
- 1999
(Show Context)
Citation Context ..., large and challenging dataset that wassrst used in the data mining contest associated with the 1998 KDD conference. This dataset and associated documentation are available in the UCI KDD repository =-=[2]-=-. The dataset contains information about persons who have made donations in the past to a certain charity. The decisionmaking task is to choose which donors to request a new donation from. This task i... |

96 | A Comparative Analysis of Methods For Pruning Decision Trees - Esposito, Malerba, et al. - 1997 |

41 | An empirical comparison of voting classi algorithms: Bagging, boosting, and variants - Bauer, Kohavi - 1998 |

41 | Well-Trained PETs: Improving Probability Estimation Trees. CeDER Working Paper #IS-00-04 - Provost, Domingos - 2000 |

36 | Neural Networks for Pattern Recognition, chapter 6.4: Modelling conditional distributions - Bishop - 1995 |

31 | Sample Selection Bias as a Speci Error - Heckman |

25 | Assessing the calibration of Naive Bayesâ€™ posterior estimates
- Bennett
(Show Context)
Citation Context ...ee of detail. Sobehart et al. [20] use Gaussian kernel regression method in a similar context. Applying parametric methods to calibrate naive Bayes scores is not straightforward. For example, Bennett =-=[3-=-] reports that sigmoid functions cannot transform naive Bayes scores into well-calibrated probability estimates. With most learning methods, in order to obtain binned estimates that do not overt the t... |

19 |
An empirical comparison of voting classi cation algorithms: Bagging, boosting, and variants
- Bauer, Kohavi
(Show Context)
Citation Context ...K + 98], whosnd that performing no pruning and variants of pruning adapted to loss minimization both lead to similar performance. Not using pruning is also suggested by Bauer and Kohavi (Section 7.3, =-=[BK99]-=-). 4.1 Improving probability estimates by binning The binning or histogram method is a simple non-parametric approach to probability density estimation [Bis95]. Given a set of examples for which an at... |

15 | MetaCost: A general method for making classi cost-sensitive - Domingos - 1999 |

12 | Robust Classi for Imprecise Environments - Provost, Fawcett - 2000 |

11 | On class probability estimates and cost-sensitive evaluation of classifiers - Margineantu - 2000 |

8 | Cost-sensitive learning bibliography - Turney - 2000 |

6 | C.E.: Pruning decision trees with misclassi costs - Bradford, Kunz, et al. - 1998 |

6 | and Pseudo-Bayes Estimates of Conditional Probabilities and Their Reliability - Bayes - 1993 |

6 | Cost-sensitive learning and decision-making when costs are unknown - Elkan - 2000 |

5 | KDD'99 competition: Knowledge discovery contest report. Available at http://www-cse.ucsd.edu/ users/elkan/kdresults.html - Georges, Milley - 1999 |

3 |
Assessing the performance of direct marketing scoring models
- Malthouse
- 2001
(Show Context)
Citation Context ...istically independent. Always using the same training and test set removes one source of variance, so even small dierences in performance between data mining methods are in fact likely to be genuine [=-=14]-=-. 7. CONCLUSIONS The main contributions of this paper are the following: 1. We explain a general method of cost-sensitive learning that performs systematically better than MetaCost in our experiments.... |

1 | Probability estimation for classi trees - Walker - 1992 |