Results 11 - 20
of
451
Cluster Analysis for Gene Expression Data: A Survey
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract
-
Cited by 149 (5 self)
- Add to MetaCart
(Show Context)
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.
Clustering of time-course gene expression data using a mixed-effects model with splines
- 04, 2002, Rowe Program in Human Genetics, UC Davis School of Medicine
, 2002
"... Motivation: Time-course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. To account for time dependency of the gene expression measurements over time and the noisy nature of the microarray data, the mixed-effects model using B-splines was intr ..."
Abstract
-
Cited by 138 (4 self)
- Add to MetaCart
Motivation: Time-course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. To account for time dependency of the gene expression measurements over time and the noisy nature of the microarray data, the mixed-effects model using B-splines was introduced. This paper further explores such mixed-effects model in analyzing the time-course gene expression data and in performing clustering of genes in a mixture model framework. Results: After fitting the mixture model in the framework of the mixed-effects model using an EM algorithm, we obtained the smooth mean gene expression curve for each cluster. For each gene, we obtained the best linear unbiased smooth estimate of its gene expression trajectory over time, combining data from that gene and other genes in the same cluster. Simulated data indicate that the methods can effectively cluster noisy curves into clusters differing in either the shapes of the curves or the times to the peaks of the curves. We further demonstrate the proposed method by clustering the yeast genes based on their cell cycle gene expression data and the human genes based on the temporal transcriptional response of fibroblasts to serum. Clear periodic patterns and varying times to peaks are observed for different clusters of the cell-cycle regulated genes. Results of the analysis of the human fibroblasts data show seven distinct transcriptional response profiles with biological relevance. Availability: Matlab programs are available on request from the authors.
From patterns to pathways: gene expression data analysis comes of age.
- Nature Genetics
, 2002
"... ..."
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Validating Clustering for Gene Expression Data
- Bioinformatics
, 2000
"... Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. We provide a systematic and quantitative framework to assess the results of clustering algorithms. A typical gene expression data set contains measurements of ..."
Abstract
-
Cited by 129 (5 self)
- Add to MetaCart
(Show Context)
Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. We provide a systematic and quantitative framework to assess the results of clustering algorithms. A typical gene expression data set contains measurements of the expression levels of a fixed set of genes under various experimental conditions. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level, hopefully revealing biologically meaningful patterns of activity or control. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters---meaningful clusters should exhibit less variation in the remaining condition than clusters formed by coincidence. We have successfully applied the methodology to compare three clustering algorithms on three p...
Clustering with qualitative information
- In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
, 2003
"... We consider the problem of clustering a collection of el-ements based on pairwise judgments of similarity and dis-similarity. Bansal, Blum and Chawla [1] cast the problem thus: given a graph G whose edges are labeled “+ ” (sim-ilar) or “− ” (dissimilar), partition the vertices into clus-ters so that ..."
Abstract
-
Cited by 122 (9 self)
- Add to MetaCart
We consider the problem of clustering a collection of el-ements based on pairwise judgments of similarity and dis-similarity. Bansal, Blum and Chawla [1] cast the problem thus: given a graph G whose edges are labeled “+ ” (sim-ilar) or “− ” (dissimilar), partition the vertices into clus-ters so that the number of pairs correctly (resp. incorrectly) classified with respect to the input labeling is maximized (resp. minimized). Complete graphs, where the classifier la-bels every edge, and general graphs, where some edges are not labeled, are both worth studying. We answer several questions left open in [1] and provide a sound overview of clustering with qualitative information. We give a factor 4 approximation for minimization on complete graphs, and a factor O(log n) approximation for general graphs. For the maximization version, a PTAS for complete graphs is shown in [1]; we give a factor 0.7664 approximation for general graphs, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete graphs. 1.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
- Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
(Show Context)
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
Genes, Themes and Microarrays - Using Information Retrieval for Large-Scale Gene Analysis
, 2000
"... The immense volume of data resulting from DNA microarray experiments, accompanied byanincrease in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analy ..."
Abstract
-
Cited by 98 (8 self)
- Add to MetaCart
(Show Context)
The immense volume of data resulting from DNA microarray experiments, accompanied byanincrease in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Contentbased relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes...
Rich Probabilistic Models for Gene Expression
, 2001
"... Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. ..."
Abstract
-
Cited by 89 (8 self)
- Add to MetaCart
Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. Second, clustering methods cannot readily incorporate additional types of information, such as clinical data or known attributes of genes. To circumvent these shortcomings, we propose the use of a single coherent probabilistic model, that encompasses much of the rich structure in the genomic expression data, while incorporating additional information such as experiment type, putative binding sites, or functional information. We show how this model can be learned from the data, allowing us to discover patterns in the data and dependencies between the gene expression patterns and additional attributes. The learned model reveals context-specific relationships, that exist only over a subset of the experiments in the dataset. We demonstrate the power of our approach on synthetic data and on two real-world gene expression data sets for yeast. For example, we demonstrate a novel functionality that falls naturally out of our framework: predicting the “cluster” of the array resulting from a gene mutation based only on the gene’s expression pattern in the context of other mutations.