Results 1 - 10
of
97
A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length
, 2005
"... ..."
Reliable prediction of regulator targets using 12 Drosophila genomes, Genome Res
, 2007
"... Comparative prediction of motif instances * shared first authors + corresponding authors: ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Comparative prediction of motif instances * shared first authors + corresponding authors:
Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone
- Bioinformatics
, 2006
"... doi:10.1093/bioinformatics/btl245 ..."
A.: Nucleosome Occupancy Information Improves de novo Motif Discovery
- RECOMB 2007. LNCS (LNBI
, 2007
"... Abstract. A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genomewide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more mat ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genomewide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known transcription factor binding sites occur in the genome than are actually functional. Chromatin structure is known to play an important role in guiding transcription factors to those sites that are functional. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling transcription factors to bind DNA in those regions [1]. In this paper, we describe a novel algorithm which employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy; the nucleosome occupancy information comes from a recently published computational model [2]. When a Gibbs sampling algorithm with our informative prior is applied to yeast sequencesets identified by ChIP-chip [3], the correct motif is found in 50 % more cases than with an uninformative uniform prior. Moreover, if nucleosome occupancy information is not available, our informative prior reduces to a new kind of prior that can exploit discriminative information in a purely generative setting. 1
ABS: a database of annotated regulatory binding sites from orthologous promoters
- Nucleic Acids Res
, 2006
"... Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel comp ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS
Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm
"... Abstract—This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract—This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm’s capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm’s ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences. Index Terms—Evolutionary computation, population-based data clustering, motif discovery, transcription factor binding sites, musclespecific gene expression. 1
Improving computational predictions of cis-regulatory binding sites," Pac Symp Biocomput
, 2006
"... The location of cis-regulatory binding sites determine the connectivity of genetic regulatory networks and therefore constitute a natural focal point for research into the many biological systems controlled by such regulatory networks. Accurate computational prediction of these binding sites would f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The location of cis-regulatory binding sites determine the connectivity of genetic regulatory networks and therefore constitute a natural focal point for research into the many biological systems controlled by such regulatory networks. Accurate computational prediction of these binding sites would facilitate research into a multitude of key areas, including embryonic development, evolution, pharmacogenemics, cancer and many other transcriptional diseases, and is likely to be an important precursor for the reverse engineering of genome wide, genetic regulatory networks. Many algorithmic strategies have been developed for the computational prediction of cis-regulatory binding sites but currently all approaches are prone to high rates of false positive predictions, and many are highly dependent on additional information, limiting their usefulness as research tools. In this paper we present an approach for improving the accuracy of a selection of established prediction algorithms. Firstly, it is shown that species specific optimization of algorithmic parameters can, in some cases, significantly improve the accuracy of algorithmic predictions. Secondly, it is demonstrated that the use of non-linear classification algorithms to integrate predictions from multiple sources can result in more accurate predictions. Finally, it is shown that further improvements in prediction accuracy can be gained with the use of biologically inspired post-processing of predictions. 1
Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules
- ALGORITHMS FOR MOLECULAR BIOLOGY
, 2007
"... Background: cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statisti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Background: cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap.
Results: We developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ks or more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA.
Method: The algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Σ|(m| | + K|σ|K) ∏i ki) time complexity, where n is the length of the text, |Σ| is the alphabet size, m is the maximal motif length, | | is the total number of words in motifs, K is the order of Markov model, and ki is the number of occurrences of the ith motif.
Conclusion: The primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs.
Availability: Project web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/AhoPro/
MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes
- Nucleic Acids Res
, 2006
"... sequences from co-regulated or homologous genes ..."

