Results 1 -
2 of
2
SPRINT: A scalable parallel classifier for data mining
, 1996
"... Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classi-fication algorithms require that all or a por-tion of the the entire dataset remain perma-nently in memory. This limits their suitability for mining over large databases. ..."
Abstract
-
Cited by 228 (7 self)
- Add to MetaCart
Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classi-fication algorithms require that all or a por-tion of the the entire dataset remain perma-nently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algo-rithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalability as well. The combination of these characteristics makes the proposed algorithm an ideal tool for data min-ing. 1
SPRINT: A scalable parallel classi er for data mining
- Research report, IBM Almaden Research
, 1996
"... Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We pres ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classi cation algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalabilityaswell. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining. 1

