• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

C4.5, Class Imbalance, and Cost Sensitivity: Why Under-sampling beats Over-sampling (2003)

Cached

  • Download as a PDF

Download Links

  • [www.site.uottawa.ca]
  • [www.site.uottawa.ca]
  • [www.csi.uottawa.ca]
  • [www.csi.uottawa.ca]
  • [www.site.uottawa.ca]
  • [sci2s.ugr.es]
  • [www.site.uottawa.ca]
  • [www.site.uottawa.ca]
  • [www.csi.uottawa.ca]
  • [www.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [www.site.uottawa.ca]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Chris Drummond , Robert C. Holte
Citations:102 - 2 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Drummond03c4.5,class,
    author = {Chris Drummond and Robert C. Holte},
    title = {C4.5, Class Imbalance, and Cost Sensitivity: Why Under-sampling beats Over-sampling},
    booktitle = {},
    year = {2003},
    pages = {1--8}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper takes a new look at two sampling schemes commonly used to adapt machine learning algorithms to imbalanced classes and misclassification costs. It uses a performance analysis technique called cost curves to explore the interaction of over and undersampling with the decision tree learner C4.5. C4.5 was chosen as, when combined with one of the sampling schemes, it is quickly becoming the community standard when evaluating new cost sensitive learning algorithms. This paper shows that using C4.5 with undersampling establishes a reasonable standard for algorithmic comparison. But it is recommended that the cheapest class classifier be part of that standard as it can be better than under-sampling for relatively modest costs. Over-sampling, however, shows little sensitivity, there is often little difference in performance when misclassification costs are changed. 1.

Keyphrases

under-sampling beat    class imbalance    cost sensitivity    misclassification cost    modest cost    undersampling establishes    little difference    new cost    new look    little sensitivity    cost curve    reasonable standard    algorithmic comparison    community standard    decision tree learner c4    sampling scheme    performance analysis technique    class classifier   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University