by
Adrian E. Raftery
,
Ka Yee Yeung
,
Ka Yee Yeung
,
Chris Fraley
,
Chris Fraley
,
Alejandro Murua
,
Alejandro Murua
,
Walter L. Ruzzo
,
Walter L. Ruzzo
Bioinformatics
Add To MetaCart
Abstract:
Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. This Gaussian mixture model has been shown to be a powerful tool for many applications. In addition, the issues of selecting a "good" clustering method and determining the "correct" number of clusters are reduced to model selection problems in the probability framework.
Citations
|
971
|
Estimating the dimension of a model
– Schwarz
- 1978
|
|
614
|
Human behavior and the principle of least-effort
– Zipf
- 1949
|
|
506
|
Bayes factors
– Kaas, Raftery
- 1995
|
|
320
|
Mixture models: inference and applications to clustering
– McLachlan, Basford
- 1998
|
|
293
|
Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods And Application to Hematopoictic Differentiation
– Tamayo, Slonim, et al.
- 1999
|
|
266
|
G: Systematic determination of genetic network architecture. Nature Genet
– Tavazoie, Hughes, et al.
- 1999
|
|
177
|
Comparing partitions
– Hubert, Arabie
- 1985
|
|
142
|
Objective criteria for the evaluation of clustering methods
– Rand
- 1971
|
|
105
|
Estimating the number of clusters in a dataset via the Gap statistic
– Tibshirani, Walther, et al.
- 2000
|
|
57
|
Validating clustering for gene expression data
– Yeung, Haynor
- 2001
|
|
36
|
Measures of multivariate skewness and kurtosis with applications
– Mardia
- 1970
|
|
36
|
A study of the comparability of external criteria for hierarchical cluster analysis
– Milligan, Cooper
- 1986
|
|
32
|
Array of hope
– Lander
- 1999
|
|
30
|
MIPS: a database for protein sequences and complete genomes
– Mewes, Heumann, et al.
- 1999
|
|
28
|
An empirical study of principal component analysis for clustering gene expression data
– Yeung, Ruzzo
- 2001
|
|
23
|
Applied Multivariate Data Analysis
– Jobson
- 1991
|
|
20
|
Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas
– Schummer, Ng, et al.
- 1999
|
|
6
|
Model based document classification and clustering. Manuscript in preparation
– Murua, Tantrum, et al.
- 2001
|
|
4
|
Speed group microarray page: Hints and prejudices. Http://statwww. berkeley.edu/users/terry/zarray/Html/hintsindex.html
– Speed
- 2000
|