Results 1 - 10
of
13
De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis
- Nat. Biotechnol
, 2010
"... Gene expression is regulated in part by protein transcription factors (TFs) that bind target regulatory DNA sequences. Predicting DNA binding sites and affinities from transcription factor sequence or structure is difficult; therefore, experimental data are required to link TFs to target sequences. ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Gene expression is regulated in part by protein transcription factors (TFs) that bind target regulatory DNA sequences. Predicting DNA binding sites and affinities from transcription factor sequence or structure is difficult; therefore, experimental data are required to link TFs to target sequences. We present a microfluidics-based approach for de novo discovery and quantitative biophysical characterization of DNA target sequences. We validated our technique by measuring sequence preferences for 28 S. cerevisiae TFs with a variety of DNA binding domains, including several that have proven difficult to study via other techniques. For each TF, we measured relative binding affinities to oligonucleotides covering all possible 8-bp DNA sequences to create a comprehensive map of sequence preferences; for 4 TFs, we also determined absolute affinities. We anticipate that these data and future use of this technique will provide information essential for understanding TF specificity, improving identification of regulatory sites, and reconstructing regulatory interactions. Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
DNA motif representation with nucleotide dependency
- IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
"... ©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other wo ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY
"... Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurren ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length-500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
Detecting riboSNitches with RNA folding algorithms: a genome-wide benchmark
, 2014
"... Ribonucleic acid (RNA) secondary structure predic-tion continues to be a significant challenge, in partic-ular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA struc-tures as they pertain to individual phenoty ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Ribonucleic acid (RNA) secondary structure predic-tion continues to be a significant challenge, in partic-ular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA struc-tures as they pertain to individual phenotypes is the ability to detect RNAs with large structural dispari-ties caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark struc-ture prediction algorithms. Here we evaluate 11 dif-ferent RNA folding algorithms ’ riboSNitch prediction performance on these data. We find that recent al-gorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rig-orously validated subsets of the benchmark data. In addition, our benchmark indicates that general struc-ture prediction algorithms (e.g. RNAfold and RNAs-tructure) have overall better performance if base pair-ing probabilities are considered rather than mini-mum free energy calculations. Although overall ag-gregate algorithmic performance on the full set of ri-boSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.
Machine Learning in Computational Biology: Models of Alternative Splicing
, 2009
"... Alternative splicing, the process by which a single gene may code for similar but different proteins, is an important process in biology, linked to development, cellular differentiation, genetic diseases, and more. Genome-wide analysis of alternative splicing patterns and regulation has been recentl ..."
Abstract
- Add to MetaCart
Alternative splicing, the process by which a single gene may code for similar but different proteins, is an important process in biology, linked to development, cellular differentiation, genetic diseases, and more. Genome-wide analysis of alternative splicing patterns and regulation has been recently made possible due to new high throughput techniques for monitoring gene expression and genomic sequencing. This thesis introduces two algorithms for alternative splicing analysis based on large microarray and genomic sequence data. The algorithms, based on generative probabilistic models that capture structure and patterns in the data, are used to study global properties of alternative splicing. In the first part of the thesis, a microarray platform for monitoring alternative splicing is introduced. A spatial noise removal algorithm that removes artifacts and improves data fidelity is presented. The GenASAP algorithm (generative model for alternative splicing array platform) models the non-linear process in which targeted molecules bind to a microarray’s probes and is used to predict patterns of alternative splicing. Two versions of GenASAP have been developed. The first uses variational approximation to infer the relative amounts of the targeted molecules, while the second incorporates a more accurate
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY ∗
"... Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the ..."
Abstract
- Add to MetaCart
Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length-500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation. 1
unknown title
, 2011
"... miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling ..."
Abstract
- Add to MetaCart
miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling
RNA-binding proteins controlled by
, 2012
"... Hyper conserved elements in vertebrate mRNA 30-UTRs reveal a translational network of ..."
Abstract
- Add to MetaCart
(Show Context)
Hyper conserved elements in vertebrate mRNA 30-UTRs reveal a translational network of
Funding: This w...
, 2013
"... Background: Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Meth ..."
Abstract
- Add to MetaCart
Background: Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings: Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance: SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera.