Results 11 - 20
of
265
Exploring expression data: Identification and analysis of coexpressed genes
- Genome Research
, 1999
"... service ..."
Modelling gene expression data using dynamic bayesian networks
, 1999
"... Recently, there has been much interest in reverse engineering genetic networks from time series data. In this paper, we show that most of the proposed discrete time models — including the boolean network model [Kau93, SS96], the linear model of D’haeseleer et al. [DWFS99], and the nonlinear model of ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Recently, there has been much interest in reverse engineering genetic networks from time series data. In this paper, we show that most of the proposed discrete time models — including the boolean network model [Kau93, SS96], the linear model of D’haeseleer et al. [DWFS99], and the nonlinear model of Weaver et al. [WWS99] — are all special cases of a general class of models called Dynamic Bayesian Networks (DBNs). The advantages of DBNs include the ability to model stochasticity, to incorporate prior knowledge, and to handle hidden variables and missing data in a principled way. This paper provides a review of techniques for learning DBNs. Keywords: Genetic networks, boolean networks, Bayesian networks, neural networks, reverse engineering, machine learning. 1
A Hierarchical Unsupervised Growing Neural Network for Clustering Gene Expression Patterns
, 2001
"... Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlat ..."
Abstract
-
Cited by 98 (8 self)
- Add to MetaCart
Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226--233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and t...
Aligning Gene Expression Time Series With Time Warping Algorithms
, 2001
"... Motivation: Increasingly, biological processes are being studied through time series of RNA expression data collected for large numbers of genes. Because common processes may unfold at varying rates in different experiments or individuals, methods are needed that will allow corresponding expression ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
Motivation: Increasingly, biological processes are being studied through time series of RNA expression data collected for large numbers of genes. Because common processes may unfold at varying rates in different experiments or individuals, methods are needed that will allow corresponding expression states in different time series to be mapped to one another. Results: We present implementations of time warping algorithms applicable to RNA and protein expression data and demonstrate their application to published yeast RNA expression time series. Programs executing two warping algorithms are described, a simple warping algorithm and an interpolative algorithm, along with programs that generate graphics that visually present alignment information. We show time warping to be superior to simple clustering at mapping corresponding time states. We document the impact of statistical measurement noise and sample size on the quality of time alignments, and present issues related to statistical assessment of alignment quality through alignment scores. We also discuss directions for algorithm improvement including development of multiple time series alignments and possible applications to causality searches and non-temporal processes (`concentration warping'). Availability: Academic implementations of alignment programs genewarp and genewarpi and the graphics generation programs grphwarp and grphwarpi are available as Win32 system DOS box executables on our web site along with documentation on their use. The publicly available data on which they were demonstrated may be found at http://genome-www.stanford.edu/cellcycle/. Postscript files generated by grphwarp and grphwarpi may be directly printed or viewed using GhostView software available at http://www.cs.wisc.edu/#ghost/. Con...
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 72 (2 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Genes, Themes and Microarrays - Using Information Retrieval for Large-Scale Gene Analysis
, 2000
"... The immense volume of data resulting from DNA microarray experiments, accompanied byanincrease in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analy ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
The immense volume of data resulting from DNA microarray experiments, accompanied byanincrease in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Contentbased relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes...
Validating Clustering for Gene Expression Data
- Bioinformatics
, 2000
"... Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. We provide a systematic and quantitative framework to assess the results of clustering algorithms. A typical gene expression data set contains measurements of ..."
Abstract
-
Cited by 65 (5 self)
- Add to MetaCart
Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. We provide a systematic and quantitative framework to assess the results of clustering algorithms. A typical gene expression data set contains measurements of the expression levels of a fixed set of genes under various experimental conditions. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level, hopefully revealing biologically meaningful patterns of activity or control. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters---meaningful clusters should exhibit less variation in the remaining condition than clusters formed by coincidence. We have successfully applied the methodology to compare three clustering algorithms on three p...
WN: Gene functional classification from heterogeneous data
- In Proceedings of the Fifth Annual International Conference on Computational Biology: April 22-25, 2001
"... In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measureme ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. We also show how to use knowledge about heterogeneity to aid in feature selection. 1
Minimum sumsquared residue co-clustering of gene expression data
- In SDM
, 2004
"... Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Sin ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Since only a small subset of the genes participate in any cellular process of interest, by focusing on subsets of genes and conditions, we can lower the noise induced by other genes and conditions — a co-cluster characterizes such a subset of interest. Cheng and Church [3] introduced an effective measure of co-cluster quality based on mean squared residue. In this paper, we use two similar squared residue measures and propose two fast k-means like co-clustering algorithms corresponding to the two residue measures. Our algorithms discover k row clusters and l column clusters simultaneously while monotonically decreasing the respective squared residues. Our co-clustering algorithms inherit the simplicity, efficiency and wide applicability of the k-means algorithm. Minimizing the residues may also be formulated as trace optimization problems that allow us to obtain a spectral relaxation that we use for a principled initialization for our iterative algorithms. We further enhance our algorithms by an incremental local search strategy that helps avoid empty clusters and escape poor local minima. We illustrate co-clustering results on a yeast cell cycle dataset and a human B-cell lymphoma dataset. Our experiments show that our co-clustering algorithms are efficient and are able to discover coherent co-clusters. Keywords: Gene-expression, co-clustering, biclustering, residue, spectral relaxation
Identifying Target Sites for Cooperatively Binding Factors
, 2001
"... Motivation: Transcriptional activation in eukaryotic organisms normally requires combinatorial interactions of multiple transcription factors. Though several methods exist for identification of individual protein binding site patterns in DNA sequences, there are few methods for discovery of binding ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
Motivation: Transcriptional activation in eukaryotic organisms normally requires combinatorial interactions of multiple transcription factors. Though several methods exist for identification of individual protein binding site patterns in DNA sequences, there are few methods for discovery of binding site patterns for cooperatively acting factors. Here we present an algorithm, Co-Bind (for COperative BINDing) , for discovering DNA target sites for cooperatively acting transcription factors. The method utilizes a Gibbs sampling strategy to model the cooperativity between two transcription factors and defines position weight matrices for the binding sites. Sequences from both the training set and the entire genome are taken into account, in order to discriminate against commonly occurring patterns in the genome, and produce patterns which are significant only in the training set. Results: We have tested Co-Bind on semi-synthetic and real data sets to show it can efficiently identify DNA target site patterns for cooperatively binding transcription factors. In cases where binding site patterns are weak and cannot be identified by other available methods, Co-Bind, by virtue of modeling the cooperativity between factors, can identify those sites efficiently. Though developed to model protein-- DNA interactions, the scope of Co-Bind may be extended to combinatorial, sequence specific, interactions in other macromolecules. Availability: The program is available upon request from the authors or may be downloaded from http://ural.wustl. edu. Contact: dg@genetics.wustl.edu; stormo@genetics.wustl.edu 1

