Results 1 - 10
of
41
Gibbs recursive sampler: finding transcription factor binding sites
- Nucleic Acids Res
, 2003
"... The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor bindi ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor binding sites for multiple transcription factors simultaneously in unaligned DNA sequences that may be heterogeneous in DNA composition. Here we describe the basic operation of the web-based version of this sampler. The sampler may be accessed at
What is bioinformatics? A proposed definition and overview of the field
"... BACKGROUND: The recent flood of data from genome sequencing and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some the research that is being pursued, p ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
BACKGROUND: The recent flood of data from genome sequencing and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some the research that is being pursued, particularly in relation to transcriptional regulatory systems. METHODS: Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. RESULTS & CONCLUSIONS: Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (eg expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and proteinprotein interaction networks. Bioinformatics employs a wide range of computational topics including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches that integrate a variety of computational techniques and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the w...
ab initio prediction of transcription factor targets using structural knowledge
- PLoS Comput Biol
, 2005
"... Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structur ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys 2His 2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys 2His 2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Citation: Kaplan T, Friedman N, Margalit H (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comp Biol 1(1): e1.
The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons
- Genome Res
, 2002
"... service ..."
Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes
, 2005
"... A new online framework for the accurate and integrative prediction of transcription factor binding sites (TFBSs) in prokaryotes was developed. The system consists of three interconnected modules: 1. The PRODORIC database as a comprehensive data source and extensive collection of TFBSs with correspon ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A new online framework for the accurate and integrative prediction of transcription factor binding sites (TFBSs) in prokaryotes was developed. The system consists of three interconnected modules: 1. The PRODORIC database as a comprehensive data source and extensive collection of TFBSs with corresponding position weight matrices (PWMs). 2. The pattern matching tool Virtual Footprint for the prediction of genome based regulons and for the analysis of individual promoter regions. 3. The interactive genome browser GBPro for the visualization of TFBS search results in their genomic context and links to gene and regulator-specific information in PRODORIC. The aim of this service is to provide researchers a free and easy to use collection of interconnected tools in the field of molecular microbiology, infection and systems biology.
Pattern Discovery in Sequences under a Markov Assumption
, 2002
"... In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA sequences. We investigate the fundamental aspects of this data mining problem that can make discover ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA sequences. We investigate the fundamental aspects of this data mining problem that can make discovery "easy" or "hard." We present a general framework for characterizing learning in this context by deriving the Bayes error rate for this problem under a Markov assumption. The Bayes error framework demonstrates why certain patterns are much harder to discover than others. It also explains the role of di#erent parameters such as pattern length and pattern frequency in sequential discovery. We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance. We discuss a number of fundamental issues that characterize sequential pattern discovery in this context, present a variety of empirical results to complement and verify the theoretical analysis, and apply our methodology to real-world motif-discovery problems in computational biology.
What is bioinformatics? An introduction and overview
, 2001
"... A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies. In this review we provide an introduction and overview of the current state of the field. We discuss the main principles that underpin bioinformatics analyses, look at the types of biological information and databases that are commonly used, and finally examine some of the studies that are being conducted, particularly with reference to transcription regulatory systems. 2. Introduction Biological data are flooding in at an unprecedented rate (1). For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries (2) and the SWISS-PROT databas...
CorreLogo: An online server for 3D sequence logos of RNA
- and DNA alignments,” Nucleic Acids Res
, 2006
"... We present an online server that generates a 3D representation of properties of user-submitted RNA or DNA alignments. The visualized properties are information of single alignment columns, mutual information of two alignment positions as well as the position-specific fraction of gaps. The nucleotide ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We present an online server that generates a 3D representation of properties of user-submitted RNA or DNA alignments. The visualized properties are information of single alignment columns, mutual information of two alignment positions as well as the position-specific fraction of gaps. The nucleotide composition of both single columns and column pairs is visualized with the help of color-coded 3D bars labeled with letters. The server generates both VRML and JVX output that can be viewed with a VRML viewer or the JavaView applet, respectively. We show that combining these different features of an alignment into one 3D representation is helpful in identifying correlations between bases and potential RNA and DNA base pairs. Significant known correlations between the tRNA 30 anticodon cardinal nucleotide and the extended anticodon were observed, as were correlations within the amino acid acceptor stem and between the cardinal nucleotide and the acceptor stem. The online server can be accessed using the
H (2005) Predicting transcription factor binding sites using structural knowledge
- Proceedings of the Ninth International Conference on Research in Computational Molecular Biology: Lecture notes in computer science, Volume 3,500
, 2005
"... Abstract. Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data an ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid-nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We apply our approach to the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with various experimental results. To demonstrate the potential of our algorithm, we use the learned preferences to predict binding site models for novel proteins from the same family. These models are then used in genomic scans to find putative binding sites of the novel proteins. 1

