Results 1 
1 of
1
Distributed Learning on Very Large Data Sets
 In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... One approach to learning from intractably large data sets is to utilize all the training data by learning models on tractably sized subsets of the data. The subsets of data may be disjoint or partially overlapping. The individual learned models may be combined into a single model or a voting approac ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
One approach to learning from intractably large data sets is to utilize all the training data by learning models on tractably sized subsets of the data. The subsets of data may be disjoint or partially overlapping. The individual learned models may be combined into a single model or a voting approachmay be used to combine the classi#cations of a set of models. An approach to learning models in parallel from arbitrarily large training data sets and combining them into a classi#er is described. The training sets are disjoint in the work described here. A parallel implementation on the DOE's ASCI Red parallel supercomputer is described. Results with data sets small enough to be handled by a single processor show that data sets can be divided into a moderate number of distinct subsets without degrading classi#er accuracy. Speedup results are shown for a parallel implementation on the ASCI Red with data sets too large to be handled on a single processor. Training sets of size 3 to 50 millio...