MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

SLIQ: A Fast Scalable Classifier for Data Mining (1996) [159 citations — 7 self]

Abstract:

. Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ 1 , a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an ...

Citations

2573 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
385 Stochastic Complexity – Rissanen - 1987
251 Inferring decision trees using the minimum description length principle – Quinlan, Rivest - 1989
215 Database Mining: A Performance Perspective – Agrawal, Imielinski, et al. - 1993
193 Computer Systems That Learn: Classification and Prediction Methods from Statistics – Weiss, Kulikowski - 1991
143 Classi cation and Regression Trees – Breiman, Friedman, et al. - 1984
100 An interval classifier for database mining applications – Agrawal, Ghosh, et al. - 1992
78 Megainduction : A Machine Learning on Very Large Databases – Catlett - 1991
78 Coding decision trees – Wallace, Patrick - 1993
64 Stochastic complexity in statistical inquiry. World scientific seies in computer science – Rissanen - 1989
58 Meta-learning for multistrategy and parallel learning – Chan, Stolfo - 1993
44 R.: MDL-based decision tree pruning – Mehta, Rissanen, et al. - 1995
18 Computer Systems that Learn: Classi cation and Prediction Methods from – Weiss, Kulikowski - 1991
11 An interval classi er for database mining applications – Agrawal, Ghosh, et al. - 1992
9 Classification and Regression Trees – al - 1984
1 An interval classifier for database mining applications – al - 1992
1 Computer Systems that Learn: Classijication and Prediction Methods from Statistics – Weiss, Kulikowski - 1991