@INPROCEEDINGS{Kononenko95onbiases, author = {Igor Kononenko}, title = {On Biases in Estimating Multi-Valued Attributes}, booktitle = {}, year = {1995}, pages = {1034--1040}, publisher = {Morgan Kaufmann} }
Years of Citing Articles
Bookmark
OpenURL
Abstract
We analyse the biases of eleven measures for estimating the quality of the multi-valued attributes. The values of information gain, J- measure, gini-index, and relevance tend to linearly increase with the number of values of an attribute. The values of gain-ratio, distance measure, Relief , and the weight of evidence decrease for informative attributes and increase for irrelevant attributes. The bias of the statistic tests based on the chi-square distribution is similar but these functions are not able to discriminate among the attributes of different quality. We also introduce a new function based on the MDL principle whose value slightly decreases with the increasing number of attribute's values. 1 Introduction In top down induction of decision trees various impurity functions are used to estimate the quality of attributes in order to select the "best" one to split on. However, various heuristics tend to overestimate the multi-valued attributes. One possible approach to this proble...