Abstract:
A clustering and ordination algorithm suitable for mining extremely large databases, including those produced by microarray expression studies, is described and analyzed for stability. Data from a yeast cell cycle experiment with 6000 genes and 18 experimental measurements per gene are used to test this algorithm under practical conditions. The process of assigning database objects to an X, Y coordinate, ordination, is shown to be stable with respect to random starting conditions, and with respect to minor perturbations in the starting similarity estimates. Careful analysis of the way clusters typically co-locate, versus the occasional large displacements under different starting conditions are shown to be useful in interpreting the data. This extra stability information is lost when only a single cluster is reported, which is currently the accepted practice. However, it is believed that the approaches presented here should become a standard part of best practices in analyzing computer clustering of large data collections.
Citations
|
2357
|
Optimization by simulated annealing
– Kirkpatrick, Gelatt, et al.
- 1983
|
|
858
|
Some methods for classification and analysis of multivariate observations
– MacQueen
- 1967
|
|
822
|
Self-Organized Formation of Topologically Correct Feature Maps
– Kohonen
- 1982
|
|
412
|
Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microarray hybridization', Molecular Biology of the Cell 9
– Spellman, Sherlock, et al.
- 1998
|
|
374
|
A heuristic for graph drawing
– Eades
- 1984
|
|
261
|
An algorithm for drawing general undirected graphs
– Kamada, Kawai
- 1989
|
|
150
|
Visualizing the nonvisual: spatial analysis and interaction with information from text documents
– WISE, THOMAS, et al.
- 1995
|
|
128
|
Drawing graphs nicely using simulated annealing
– Davidson, Harel
- 1996
|
|
83
|
The annealing algorithm
– Otten, P
- 1989
|
|
46
|
A Force Directed Component Placement Procedure for Printed Circuit Boards
– Breuer
- 1979
|
|
34
|
The ecological approach to text visualization
– Wise
- 1999
|
|
33
|
S.: A simple method for computing general position in displaying three-dimensional objects
– KAMADA, KAWAI
- 1988
|
|
30
|
Knowledge mining with VxInsight: Discovery through interaction
– Davidson, Hendrickson, et al.
- 1998
|
|
20
|
On the ‘probable error’ of a coefficient of correlation deduced from a small sample
– Fisher
- 1921
|
|
17
|
Introduction to robust estimation and hypothesis testing, Second edition
– Wilcox
- 2005
|
|
17
|
Automatic Display of Network Structures for Human Understanding
– Kamada, Kawai
- 1988
|
|
12
|
Graph drawing by force-directed placement
– Fruchtermann, Reingold
- 1990
|