• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

2003c) Genome-wide discovery of transcriptional modules from DNA sequence and gene expression (0)

by E Segal
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 35
Next 10 →

Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks

by Seiya Imoto, Tomoyuki Higuchi, Takao Goto, Kousuke Tashiro, Satoru Kuhara, Satoru Miyano - In Proceedings of the IEEE Computer Society Bioinformatics Conference (CSB 03 , 2003
"... We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including protein-protein interactions, protein-DNA interactions, binding site information, existing literature and so on. Unfortunately, m ..."
Abstract - Cited by 38 (4 self) - Add to MetaCart
We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including protein-protein interactions, protein-DNA interactions, binding site information, existing literature and so on. Unfortunately, microarray data do not contain enough information for constructing gene networks accurately in many cases. Our method adds biological knowledge to the estimation method of gene networks under a Bayesian statistical framework, and also controls the trade-off between microarray information and biological knowledge automatically. We conduct Monte Carlo simulations to show the effectiveness of the proposed method. We analyze Saccharomyces cerevisiae gene expression data as an application. 1.

Predicting genetic regulatory response using classification

by Manuel Middendorf, Anshul Kundaje, Chris Wiggins, Yoav Freund, Christina Leslie - Bioinformatics , 2004
"... ..."
Abstract - Cited by 28 (4 self) - Add to MetaCart
Abstract not found

A discriminative model for identifying spatial cis-regulatory modules

by Eran Segal, Roded Sharan - In Proc. RECOMB’04 , 2004
"... Transcriptional regulation is mediated by the coordinated binding of transcription factors to the upstream regions of genes. In higher eukaryotes, the binding sites of cooperating transcription factors are organized into short sequence units, called cis-regulatory modules. In this paper, we propose ..."
Abstract - Cited by 18 (1 self) - Add to MetaCart
Transcriptional regulation is mediated by the coordinated binding of transcription factors to the upstream regions of genes. In higher eukaryotes, the binding sites of cooperating transcription factors are organized into short sequence units, called cis-regulatory modules. In this paper, we propose a method for identifying modules of transcription factor binding sites in a set of co-regulated genes, using only the raw sequence data as input. Our method is based on a novel probabilistic model that describes the mechanism of cis-regulation, including the binding sites of cooperating transcription factors, the organization of these binding sites into short sequence modules, and the regulation of a gene by its modules. We show that our method is successful in discovering planted modules in simulated data and known modules in yeast. More importantly, we applied our method to a large collection of human gene sets and found 83 significant cis-regulatory modules, which included 36 known motifs and many novel ones. Thus, our results provide one of the first comprehensive compendiums of putative cis-regulatory modules in human. Key words: cis-regulatory module, probabilistic model, transcriptional regulation. 1.

Motif discovery through predictive modeling of gene regulation

by Manuel Middendorf, Anshul Kundaje, Mihir Shah, Yoav Freund, Chris H. Wiggins, Christina Leslie - Proceedings of the Ninth Annual International Conference on Research in Computational Molecular Biology (RECOMB , 2005
"... Abstract. We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensi ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
Abstract. We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a k-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature. 1

Boosted Bayesian Network Classifiers

by Yushi Jing, Vladimir Pavlović, James M. Rehg
"... Abstract — The use of Bayesian networks for classification problems has received significant recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the ac ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract — The use of Bayesian networks for classification problems has received significant recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal of classification (label prediction accuracy). Recent approaches to optimizing classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present Boosted Bayesian Network Classifiers, a framework to combine discriminative data-weighting with generative training of intermediate models. We show that Boosted Bayesian network Classifiers encompass the basic generative models in isolation, but improve their classification performance when the model structure is suboptimal. This framework can be easily extended to temporal Bayesian network models including HMM and DBN. On a large suite of benchmark data-sets, this approach outperforms generative graphical models such as naive Bayes, TAN, unrestricted Bayesian network and DBN in classification accuracy. Boosted Bayesian network classifiers have comparable or better performance in comparison to other discriminatively trained graphical models including ELR-NB, ELR-TAN, BNC-2P, BNC-MDL and CRF. Furthermore, boosted Bayesian networks require significantly less training time than all of the competing methods. I.

A feature-based approach to modeling protein-DNA interactions

by Eilon Sharon, Eran Segal - In Proc. RECOMB’07 , 2007
"... Abstract. Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases this simplify ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Abstract. Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our models, and devise an algorithm for learning their structural features from binding site data. We evaluate our approach on synthetic data, and then apply it to binding site and ChIP-chip data from yeast. We reveal sequence features that are present in the binding specificities of yeast TFs, and show that FMMs explain the binding data significantly better than PSSMs. Key words: transcription factor binding sites, DNA sequence motifs, probabilistic graphical models, Markov networks, motif finder. 1

Abstract Labeled graph notations for graphical models Extended Report

by Eric Mjolsness , 2004
"... We introduce new diagrammatic notations for probabilistic independence networks (including Bayes nets and graphical models). These notations include new node and link types that allow for natural representation of a wide range of probabilistic data models including complex hierarchical models. The d ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We introduce new diagrammatic notations for probabilistic independence networks (including Bayes nets and graphical models). These notations include new node and link types that allow for natural representation of a wide range of probabilistic data models including complex hierarchical models. The diagrammatic notations also support models defined on variable numbers of complex objects and relationships. Node types include random variable nodes, index nodes, constraint nodes, and an object supernode. Link types include conditional dependency, indexing and index limitation, variable value limitation, and gating a dependency between nodes or objects by an arbitrary graph. Examples are shown for clustering problems, information retrieval, unknown graph structures in biological regulation, and other scientific domains. The diagrams may be taken as a shorthand notation for a more detailed syntactic representation by an algebraic expression for factored probability distributions, which in turn may be specified by stochastic parameterized grammar or graph grammar models. We illustrate these ideas with previously described applications and potential new ones. 1. Extending graph notation for dependency networks

A: On the feasibility of Heterogeneous Analysis of Large Scale Biological Data

by Ivan G. Costa, Er Schliep - Proceedings of ECML/PKDD 2006 Workshop on Data and Text Mining for Integrative Biology
"... Abstract. Secondary information such as Gene Ontology (GO) annotations or location analysis of transcription factor binding is often relied upon to demonstrate validity of clusters, by considering whether individual terms or factors are significantly enriched in clusters. If such an enrichment indee ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract. Secondary information such as Gene Ontology (GO) annotations or location analysis of transcription factor binding is often relied upon to demonstrate validity of clusters, by considering whether individual terms or factors are significantly enriched in clusters. If such an enrichment indeed supports validity, it should be helpful in finding biologically meaningful clusters in the first place. One simple framework which allows to do so and which does not rely on strong assumptions about the data is semi-supervised learning. A primary data source, gene expression time-courses, is clustered and GO annotation or transcription factor binding information, the secondary data, is used to define soft pair-wise constraints for pairs of genes for the computation of clusters. We show that this approach improves performance when high quality labels are available, but naive use of the heterogeneous data routinely used for cluster validation will actually decrease performance in clustering. 1

GenRate: a generative model that finds and scores new genes and exons in genomic microarray data

by B. Frey, Q. Morris, W. Zhang, N. Mohammad, T. Hughes, Brendan J. Frey, Quaid D. Morris, Wen Zhang, Naveed Mohammad, Timothy R. Hughes - Pacific Symposium on Biocomputing (PSB , 2005
"... Recently, researchers have made some progress in using microarrays to validate predicted exons in genome sequence and find new gene structures. However, current methods rely on separately making threshold-based decisions on intensity of expression, similarity of expression profiles, and arrangements ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Recently, researchers have made some progress in using microarrays to validate predicted exons in genome sequence and find new gene structures. However, current methods rely on separately making threshold-based decisions on intensity of expression, similarity of expression profiles, and arrangements of exons in the genome. We have taken a Bayesian approach and developed GenRate, a generative model that accounts for both genome-wide expression data taken from multiple conditions (e.g. tissues) and co-location and density of probes in DNA sequence data. GenRate balances probabilistic evidence derived from different sources and outputs scores (log-likelihoods) for each gene model, enabling the estimation of false-positive and false-negative rates. The model has a number of local minima that is exponential in the length of the DNA sequence data, so direct application of the EM learning algorithm produces poor results. We describe a novel way of parameterizing the model using examples from the data set, so that good solutions are found using an efficient algorithm. We apply GenRate to a subset of mouse genome-wide expression data that we have created, and discuss the statistical significance of the genes found by GenRate. Three of the highest-ranking gene structures found by GenRate, each containing thousands of bases from the genome, are confirmed using RT-PCR experiments. 1

an Interactive Approach to Mining Gene Expression Data

by Daxin Jiang, Jian Pei, Ieee Computer Society, Aidong Zhang - Journal of Transactions on Knowledge and Data Engineering , 2005
"... Abstract—Effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications. Several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns. ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract—Effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications. Several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns. However, there is no objective standard for groups of coexpressed genes. The interpretation of co-expression heavily depends on domain knowledge. Furthermore, groups of coexpressed genes in gene expression data are often highly connected through a large number of “intermediate ” genes. There may be no clear boundaries to separate clusters. Clustering gene expression data also faces the challenges of satisfying biological domain requirements and addressing the high connectivity of the data sets. In this paper, we propose an interactive framework for exploring coherent patterns in gene expression data. A novel coherent pattern index is proposed to give users highly confident indications of the existence of coherent patterns. To derive a coherent pattern index and facilitate clustering, we devise an attraction tree structure that summarizes the coherence information among genes in the data set. We present efficient and scalable algorithms for constructing attraction trees and coherent pattern indices from gene expression data sets. Our experimental results show that our approach is effective in mining gene expression data and is scalable for mining large data sets. Index Terms—Bioinformatics, gene expression (microarray) data, clustering, interactive data mining. æ 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University