## Scalable Parallel Clustering for Data Mining on Multicomputers (2000)

Venue: | Lecture Notes in Computer Science |

Citations: | 20 - 1 self |

### BibTeX

@INPROCEEDINGS{Foti00scalableparallel,

author = {D. Foti and D. Lipari and C. Pizzuti and D. Talia},

title = {Scalable Parallel Clustering for Data Mining on Multicomputers},

booktitle = {Lecture Notes in Computer Science},

year = {2000},

pages = {390--398},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results. The system architecture, its implementation and experimental performance results on different processor numbers and dataset sizes are presented and discussed. In particular, efficiency and scalability of P-AutoClass versus the sequential AutoClass system are evaluated and compared. 1

### Citations

479 |
Bayesian classification (AutoClass: theory and results
- Cheeseman, Stutz
- 1996
(Show Context)
Citation Context ...These statistics are then used to generate new MAP values for the parameters and the cycle is repeated. Based on this theory, Cheeseman and colleagues at NASA Ames Research Center developed AutoClass =-=[1]-=- originally in Lisp. Then the system has been ported from Lisp to C. The C version of AutoClass improved the performance of the system of about ten times and has provided a version of the system that ... |

80 | Parallel algorithms for hierarchical clustering
- OLSON
- 1995
(Show Context)
Citation Context ...nowledge from largescale data repositories. Recently there has been an increasing interest in parallel implementations of data clustering algorithms. Parallel approaches to clustering can be found in =-=[8, 4, 9, 5, 10]-=-. In this paper we consider a parallel clustering algorithm based on Bayesian classification for distributed memory multicomputers. We propose a parallel implementation of the AutoClass algorithm, cal... |

69 |
Mining very large databases with parallel processing
- Freitas, Lavington
- 1998
(Show Context)
Citation Context ...tuples 10000 tuples 20000 tuples 30000 tuples 40000 tuples 50000 tuples 100000 tuples In the past few years there has been an increasing interest in parallel implementations of data mining algorithms =-=[2]-=-. The first approaches to the parallelization of AutoClass have been done on SIMD parallel machines by using compilers that automatically generated data-parallel code starting from the sequential prog... |

40 | Large-scale parallel data clustering
- Judd, McKinley, et al.
- 1996
(Show Context)
Citation Context ...nowledge from largescale data repositories. Recently there has been an increasing interest in parallel implementations of data clustering algorithms. Parallel approaches to clustering can be found in =-=[8, 4, 9, 5, 10]-=-. In this paper we consider a parallel clustering algorithm based on Bayesian classification for distributed memory multicomputers. We propose a parallel implementation of the AutoClass algorithm, cal... |

16 |
Bayesian Classification on Protein Structure
- Hunter, States
- 1992
(Show Context)
Citation Context ... is not a very large dataset. For the clustering of a satellite image AutoClass took more than 130 hours [6] and the analysis of protein sequences the discovery process required from 300 to 400 hours =-=[3]-=-. These considerations and experiences suggest that it is necessary to implement faster versions of AutoClass to handle very large data set in reasonable time. This can be done by exploiting the inher... |

8 |
Parallel K-means clustering for large data sets
- Stoffel, Belkoniene
- 1999
(Show Context)
Citation Context ...nowledge from largescale data repositories. Recently there has been an increasing interest in parallel implementations of data clustering algorithms. Parallel approaches to clustering can be found in =-=[8, 4, 9, 5, 10]-=-. In this paper we consider a parallel clustering algorithm based on Bayesian classification for distributed memory multicomputers. We propose a parallel implementation of the AutoClass algorithm, cal... |

5 | Performance Evaluation on Large-Scale Parallel Clustering
- Judd, McKinley, et al.
- 1997
(Show Context)
Citation Context |

3 |
An Improved Automatic Classification of a Landsat/TM Image from
- Kanefsky, Stutz, et al.
- 1994
(Show Context)
Citation Context ... dataset, more than 1 day is necessary to analyze a dataset composed of about 140K tuples, that is not a very large dataset. For the clustering of a satellite image AutoClass took more than 130 hours =-=[6]-=- and the analysis of protein sequences the discovery process required from 300 to 400 hours [3]. These considerations and experiences suggest that it is necessary to implement faster versions of AutoC... |

3 |
Seeking Parallelism in Discovery Programs
- Potts
- 1996
(Show Context)
Citation Context |

1 |
Parallelisation of AutoClass
- Miller, Guo
- 1997
(Show Context)
Citation Context ...roximations Fig. 3. The structure of the base_cycle function. In particular, analyzing the time spent in each of the three functions called by base_cycle, it appears, as observed in other experiences =-=[7]-=-, that the update_wts and update_parameters functions are the time consuming functions whereas the time spent in the update_approximation is negligible. Therefore, we studied the parallelization of th... |