## Initialization of Iterative Refinement Clustering Algorithms (1998)

### Cached

### Download Links

- [www.aaai.org]
- [www.aaai.org]
- [ftp.research.microsoft.com]
- DBLP

### Other Repositories/Bibliography

Citations: | 67 - 2 self |

### BibTeX

@INPROCEEDINGS{Fayyad98initializationof,

author = {Usama M. Fayyad and Cory A. Reina and Paul S. Bradley and Usama Fayyad and Cory Reina and P. S. Bradley},

title = {Initialization of Iterative Refinement Clustering Algorithms},

booktitle = {},

year = {1998},

pages = {194--198},

publisher = {AAAI Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Iterative refinement clustering algorithms (e.g. K-Means, EM) converge to one of numerous local minima. It is known that they are especially sensitive to initial conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition leads to convergence to "better" local minima. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the Expectation Maximization (EM) clustering algorithm and show that refined initial points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale clustering in data mining. 1 Background Clustering has been formulated in var...

### Citations

9193 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

5432 | Neural Networks for Pattern Recognition - Bishop - 1995 |

4204 |
PE.(1973) Pattern Classification and Scene Analysis [M
- Duda, Hart
(Show Context)
Citation Context ... l p �� S �� S x x x 1 2 / 1 2 / 2 1 exp 2 1 ) | ( p where �� l is d-dimensional mean and S is dxd covariance matrix. There is little prior work on initialization methods for clustering. A=-=ccording to [DH73] (p. 228): &quo-=-t;One question that plagues all hill-climbing procedures is the choice of the starting point. Unfortunately, there is no simple, universally good solution to this problem." "Repetition with ... |

2966 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...clustering approach is to estimate the density and attempt to find the maxima ("bumps") of the estimated density function. Density estimation in high dimensions is difficult [S92], as is bum=-=p hunting [F90]-=-. We propose a method, inspired by this procedure that refines the initial point to a point likely to be closer to the modes. The challenge is to perform refinement efficiently. The basic heuristic is... |

2700 | Density Estimation for Statistics and Data Analysis - Silverman - 1986 |

679 | Knowledge Acquisition via Incremental Conceptual Clustering
- Fisher
- 1987
(Show Context)
Citation Context ...lable and can be coupled with a scalable clustering algorithm to address the large-scale clustering in data mining. 1 Background Clustering has been formulated in various ways in the machine learning =-=[F87]-=-, pattern recognition [DH73,F90], optimization [BMS97,SI84], and statistics literature [KR89,BR93,B95,S92,S86]. The fundamental clustering problem is that of grouping together data items which are sim... |

519 | Bayesian classification (AutoClass): theory and results - Cheeseman, Stutz - 1996 |

403 | Multivariate Density Estimation
- Scott
- 1992
(Show Context)
Citation Context ...d at each mode. Hence one clustering approach is to estimate the density and attempt to find the maxima ("bumps") of the estimated density function. Density estimation in high dimensions is =-=difficult [S92]-=-, as is bump hunting [F90]. We propose a method, inspired by this procedure that refines the initial point to a point likely to be closer to the modes. The challenge is to perform refinement efficient... |

357 | Model-based Gaussian and non-Gaussian clustering - Banfield, Raftery - 1993 |

260 | Scaling clustering algorithms to large databases
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...imensionality),sefficientsandsaccurate initializationsbecomesscritical.sAsclusteringssessionsonsa datassetswithsmanysdimensionssandstenssofsthousandssor millions of records can take hours to days.sIn =-=[BFR98]-=-, we presentsasmethodsforsscalingsclusteringstosveryslarge databases,sspecificallystargetedsatsdatabasessnotsfittingsin RAM.sWesshowsthatsaccuratesclusteringscansbesachieved withsimprovedsresultssover... |

252 | Refining Initial Points for KMeans Clustering
- Bradley, Fayyad
- 1998
(Show Context)
Citation Context ...ed so far has been in the context of the EM. However, we note that the same method is generalizable to other algorithms: an example of this method used to initialize the K-Means algorithm is given in =-=[BF98]-=-. Generalization is possible to discrete data (on which means are not defined). The key insight here is that if some algorithm ClusterA is being used to cluster the data, then ClusterA is also used to... |

143 | Clustering Algorithms - Rasmussen - 1992 |

134 | K-means-type algorithms: A generalized convergence theorem and characterization of local optimality - Selim, Ismail - 1984 |

85 | An experimental comparison of several clustering methods, Microsoft Research Report MSR-TR98-06
- Meila, Heckerman
(Show Context)
Citation Context ...nsists of taking the mean of the entire data and then randomly perturbing it K times [TMCH97]. This method does not appear to be better than random initialization in the case of EM over discrete data =-=[MH98]. In [BMS9-=-7], the values of initial means along any one of the d coordinate axes is determined by selecting the K densest "bins" along that coordinate. Methods to initialize EM include K-Means solutio... |

51 | Clustering via Concave Minimization
- Bradley, Mangasarian, et al.
- 1997
(Show Context)
Citation Context ...aking the mean of the entire data and then randomly perturbing it K times [TMCH97]. This method does not appear to be better than random initialization in the case of EM over discrete data [MH98]. In =-=[BMS97], the valu-=-es of initial means along any one of the d coordinate axes is determined by selecting the K densest "bins" along that coordinate. Methods to initialize EM include K-Means solutions, hierarch... |

20 | Model-based gaussian and non-Gaussian - Banfield, Raftery - 1993 |

13 | Mining Science Data - Fayyad, Haussler, et al. - 1996 |

4 |
note on sampling from a tape file
- Jones
- 1962
(Show Context)
Citation Context ... can guarantee that the records in a database are not ordered by some property, random sampling can be as expensive as scanning the entire database (using some scheme such as reservoir sampling, e.g. =-=[J62]-=-). Note that in a database environment a data view may not exist as a physical table. The result of a query may involve joins, groupings, and sorts. In many cases database operations impose a special ... |

2 |
Learning Mixtures of Bayesian Networks", Microsoft Research
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...method for initializing the means by running K clustering problems is mentioned in [DH73] for K-Means. A variant consists of taking the mean of the entire data and then randomly perturbing it K times =-=[TMCH97]-=-. This method does not appear to be better than random initialization in the case of EM over discrete data [MH98]. In [BMS97], the values of initial means along any one of the d coordinate axes is det... |

1 | Clustering via Concave Minimization - Mozer, Jordan, et al. - 1997 |

1 |
Learning Mixtures of Bayesian Networks", Microsoft Research
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...method for initializing the means by running K clustering problems is mentioned in [DH73] for K-Means. A variant consists of taking the mean of the entire data and then randomly perturbing it K times =-=[TMCH97]-=-. This method does not appear to be better than random initialization in the case of EM over discrete data [MH98]. In [BMS97], the values of initial means along any one of the d coordinate axes is det... |