Results 1 - 10
of
25
A Unified Framework for Model-based Clustering
- Journal of Machine Learning Research
, 2003
"... Model-based clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic model-based clustering based on a bipartite graph view of data and models that highlights the commonaliti ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
Model-based clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic model-based clustering based on a bipartite graph view of data and models that highlights the commonalities and differences among existing model-based clustering algorithms. In this view, clusters are represented as probabilistic models in a model space that is conceptually separate from the data space. For partitional clustering, the view is conceptually similar to the ExpectationMaximization (EM) algorithm. For hierarchical clustering, the graph-based view helps to visualize critical/important distinctions between similarity-based approaches and model-based approaches.
Cluster-based network model for time-course gene expression data. Biostatistics
, 2007
"... We propose a model–based approach to unify clustering and network modeling using time–course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster–specific expression ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We propose a model–based approach to unify clustering and network modeling using time–course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster–specific expression profiles using state–space models. We discuss the application of our model to simulated data as well as to time–course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships. Keywords: Model–based clustering, Bayesian network, dynamic linear model, mixture model, time course gene expression, prostate cancer, bioinformatics. 1
Mining Yeast Transcriptional Regulatory Modules from Factor DNA-Binding Sites and Gene Expression Data
, 2004
"... In eukaryotes, gene expression is controlled by various transcription factors that bind to the promoter regions. Transcription factors may act positively, negatively or not at all. Di#erent combinations of them may also activate or repress gene expression, and form regulatory networks of transcri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In eukaryotes, gene expression is controlled by various transcription factors that bind to the promoter regions. Transcription factors may act positively, negatively or not at all. Di#erent combinations of them may also activate or repress gene expression, and form regulatory networks of transcription. Uncovering such regulatory networks is a central challenge in genomic biology.
Computational method for temporal pattern discovery in biomedical genomic databases
- Proc 2005 IEEE CSBCON
, 2005
"... With the rapid growth of biomedical research databases, opportunities for scientific inquiry have expanded quickly and led to a demand for computational methods that can extract biologically relevant patterns among vast amounts of data. A significant challenge is identifying temporal relationships a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the rapid growth of biomedical research databases, opportunities for scientific inquiry have expanded quickly and led to a demand for computational methods that can extract biologically relevant patterns among vast amounts of data. A significant challenge is identifying temporal relationships among genotypic and clinical (phenotypic) data. Few software tools are available for such pattern matching, and they are not interoperable with existing databases. We are developing and validating a novel software method for temporal pattern discovery in biomedical genomics. In this paper, we present an efficient and flexible query algorithm (called TEMF) to extract statistical patterns from time-oriented relational databases. We show that TEMF—as an extension to our modular temporal querying application (Chronus II)—can express a wide range of complex temporal aggregations without the need for data processing in a statistical software package. We show the expressivity of TEMF using example queries from the Stanford HIV Database.
Similarity searches in genome-wide numerical data sets
"... Backgroud. Clustering approaches are commonly used to navigate and interrogate gene expression profiles, protein-protein interaction information, and other large genomic datasets. However, many biological questions that are investigated by analysis of the genome-wide measurements are not global-clus ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Backgroud. Clustering approaches are commonly used to navigate and interrogate gene expression profiles, protein-protein interaction information, and other large genomic datasets. However, many biological questions that are investigated by analysis of the genome-wide measurements are not global-clustering problems at all. Rather, a frequent problem is to find the neighbors of a query, which is a vector in the multidimensional measurement space, and to rank these neighbors by similarity to the query. Results. To address this local-clustering problem, we developed an iterative pattern-matching program called psi-square. The program searches the space of genome-wide vectors, finds a group of highly similar vectors, derives a probabilistic model of that group, and repeats database search using this model as a query. We applied the method to several pathway-discovery problems, which use three types of genome-wide datasets, namely gene content in microbes, gene expression in the blood stage of malaria parasite, and protein-protein interactions in yeast. Conclusion. The unified method of analysis is generally more sensitive and in many cases also more specific than each of the specialized methods applied to these data before.
Characterization of protein structure and function at genome scale with a computational prediction pipeline
- In Genetic Engineering, Principles and Methods
, 2003
"... Recent advances in high-throughput production capabilities for biological data such as genomic sequence (Lander et al., 2001; Venter et al., 2001), large-scale gene expression data (DeRisi et al, 1997, Chu et al., 1997, Zhu et al., 2000), genome-scale protein-protein interactions (Fields & Song, 198 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent advances in high-throughput production capabilities for biological data such as genomic sequence (Lander et al., 2001; Venter et al., 2001), large-scale gene expression data (DeRisi et al, 1997, Chu et al., 1997, Zhu et al., 2000), genome-scale protein-protein interactions (Fields & Song, 1989; Ho et al., 2000), and protein structures (Chance et al., 2002), are revolutionizing the biological sciences. Essential to this new
Vol. 23 ISMB/ECCB 2007, pages i222–i229 BIOINFORMATICS doi:10.1093/bioinformatics/btm222
"... Systematic discovery of functional modules and context-specific functional annotation of human genome ..."
Abstract
- Add to MetaCart
Systematic discovery of functional modules and context-specific functional annotation of human genome
Identification and Evaluation of Functional Modules in Gene Co-expression Networks
"... Abstract. Identifying gene functional modules is an important step towards elucidating gene functions at a global scale. In this paper, we introduce a simple method to construct gene co-expression networks from microarray data, and then propose an efficient spectral clustering algorithm to identify ..."
Abstract
- Add to MetaCart
Abstract. Identifying gene functional modules is an important step towards elucidating gene functions at a global scale. In this paper, we introduce a simple method to construct gene co-expression networks from microarray data, and then propose an efficient spectral clustering algorithm to identify natural communities, which are relatively densely connected sub-graphs, in the network. To assess the effectiveness of our approach and its advantage over existing methods, we develop a novel method to measure the agreement between the gene communities and the modular structures in other reference networks, including protein-protein interaction networks, transcriptional regulatory networks, and gene networks derived from gene annotations. We evaluate the proposed methods on two large-scale gene expression data in budding yeast and Arabidopsis thaliana. The results show that the clusters identified by our method are functionally more coherent than the clusters from several standard clustering algorithms, such as k-means, self-organizing maps, and spectral clustering, and have high agreement to the modular structures in the reference networks.
Biology Direct Research Similarity searches in genome-wide numerical data sets
, 2006
"... We present psi-square, a program for searching the space of gene vectors. The program starts with a gene vector, i.e., the set of measurements associated with a gene, and finds similar vectors, derives a probabilistic model of these vectors, then repeats search using this model as a query, and conti ..."
Abstract
- Add to MetaCart
We present psi-square, a program for searching the space of gene vectors. The program starts with a gene vector, i.e., the set of measurements associated with a gene, and finds similar vectors, derives a probabilistic model of these vectors, then repeats search using this model as a query, and continues to update the model and search again, until convergence. When applied to three different pathway-discovery problems, psi-square was generally more sensitive and sometimes more specific than the ad hoc methods developed for solving each of these problems before.

