MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Cluster Stability and the Use of Noise in Interpretation of Clustering (2001)

by George Davidson Brian
Add To MetaCart

Abstract:

A clustering and ordination algorithm suitable for mining extremely large databases, including those produced by microarray expression studies, is described and analyzed for stability. Data from a yeast cell cycle experiment with 6000 genes and 18 experimental measurements per gene are used to test this algorithm under practical conditions. The process of assigning database objects to an X, Y coordinate, ordination, is shown to be stable with respect to random starting conditions, and with respect to minor perturbations in the starting similarity estimates. Careful analysis of the way clusters typically co-locate, versus the occasional large displacements under different starting conditions are shown to be useful in interpreting the data. This extra stability information is lost when only a single cluster is reported, which is currently the accepted practice. However, it is believed that the approaches presented here should become a standard part of best practices in analyzing computer clustering of large data collections.

Citations

2357 Optimization by simulated annealing – Kirkpatrick, Gelatt, et al. - 1983
858 Some methods for classification and analysis of multivariate observations – MacQueen - 1967
822 Self-Organized Formation of Topologically Correct Feature Maps – Kohonen - 1982
412 Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microarray hybridization', Molecular Biology of the Cell 9 – Spellman, Sherlock, et al. - 1998
374 A heuristic for graph drawing – Eades - 1984
261 An algorithm for drawing general undirected graphs – Kamada, Kawai - 1989
150 Visualizing the nonvisual: spatial analysis and interaction with information from text documents – WISE, THOMAS, et al. - 1995
128 Drawing graphs nicely using simulated annealing – Davidson, Harel - 1996
83 The annealing algorithm – Otten, P - 1989
46 A Force Directed Component Placement Procedure for Printed Circuit Boards – Breuer - 1979
34 The ecological approach to text visualization – Wise - 1999
33 S.: A simple method for computing general position in displaying three-dimensional objects – KAMADA, KAWAI - 1988
30 Knowledge mining with VxInsight: Discovery through interaction – Davidson, Hendrickson, et al. - 1998
20 On the ‘probable error’ of a coefficient of correlation deduced from a small sample – Fisher - 1921
17 Introduction to robust estimation and hypothesis testing, Second edition – Wilcox - 2005
17 Automatic Display of Network Structures for Human Understanding – Kamada, Kawai - 1988
12 Graph drawing by force-directed placement – Fruchtermann, Reingold - 1990