MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Data Mining using MLC++ (1996) [34 citations — 0 self]

by Ron Kohavi ,  Dan Sommerfield ,  James Dougherty
Add To MetaCart

Abstract:

Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC ++ , which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC ++ not only provides a workbench for such comparisons, but also provides a library of C ++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. 1 Introduction Data warehouses containing massive amounts of data have been b...

Citations

3011 Pattern Classification and Scene Analysis – Duda, Hart - 1973
1565 Bagging predictors – Breiman - 1996
1405 Introduction to the Theory of Neural Computation – Hertz, Krogh, et al. - 1991
620 The CN2 induction algorithm – Clark, Niblett - 1989
477 Irrelevant features and the subset selection problem – John, Kohavi, et al. - 1994
470 From data mining to knowledge discovery: An overview – Fayyad, Piatetsky-Shapiro, et al. - 1996
330 Very simple classification rules perform well on most commonly used datasets – Holte - 1993
304 Supervised and unsupervised discretization of continuous features – Dougherty, Kohavi, et al. - 1995
251 Rule induction with CN2: Some recent improvements – Clark, Boswell - 1991
238 A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features – Cost, Salzberg - 1993
234 Beyond independence: Conditions for the optimality of the simple Bayesian classifier – Domingos, Pazzani - 1996
121 Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms – Aha - 1992
115 The Estimation of Probabilities: An Essay on Modern Bayesian Methods – Good - 1965
76 Lazy decision trees – Friedman, Kohavi, et al. - 1996
64 Theory and applications of agnostic PAC-learning with small decision trees – Auer, Holte, et al. - 1995
59 Nearest Neighbor (NN) Norms – Dasarathy - 1991
27 Cross-validation and the bootstrap: Estimating the error rate of a prediction rule – Efron, Tibshirani - 1995
21 Learning Probabilistic Relational Concept Descriptions – Ali - 1996