Results 1 - 10
of
42
CMfinder–a covariance model based RNA motif finding algorithm
- Bioinformatics
, 2006
"... doi:10.1093/bioinformatics/btk008 ..."
Bafna V: Searching Genomes for Noncoding RNA Using FastR
- IEEE/ACM Trans. on Comput. Biol. and Bioinformatics
, 2005
"... Abstract—The discovery of novel noncoding RNAs has been among the most exciting recent developments in biology. It has been hypothesized that there is, in fact, an abundance of functional noncoding RNAs (ncRNAs) with various catalytic and regulatory functions. However, the inherent signal for ncRNA ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
(Show Context)
Abstract—The discovery of novel noncoding RNAs has been among the most exciting recent developments in biology. It has been hypothesized that there is, in fact, an abundance of functional noncoding RNAs (ncRNAs) with various catalytic and regulatory functions. However, the inherent signal for ncRNA is weaker than the signal for protein coding genes, making these harder to identify. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently detect all structural homologs in a genomic database by computing the sequence and structure similarity to the query. Our approach, based on structural filters that eliminate a large portion of the database while retaining the true homologs, allows us to search a typical bacterial genome in minutes on a standard PC. The results are two orders of magnitude better than the currently available software for the problem. We applied FastR to the discovery of novel riboswitches, which are a class of RNA domains found in the untranslated regions. They are of interest because they regulate metabolite synthesis by directly binding metabolites. We searched all available eubacterial and archaeal genomes for riboswitches from purine, lysine, thiamin, and riboflavin subfamilies. Our results point to a number of novel candidates for each of these subfamilies and include genomes that were not known to contain riboswitches. Index Terms—Noncoding RNA, database search, filtration, riboswitch, bacterial genome. 1
Tree decomposition based fast search of RNA structures including pseudoknots in genomes
- In Proceedings of 2005 Computational System Bioinformatics Conference
, 2005
"... Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structuresequence alignment, computer programs capable of fast and effectively searching g ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
(Show Context)
Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structuresequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width for such conformation graphs (e.g., for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in ¥§¦©¨������� � time, ¨ where is a small parameter, � and is the size of the profiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular, very accurate searches
V.: A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements
- Bioinformatics
, 2006
"... Recent studies have uncovered an “RNA world”, in which non coding RNA (ncRNA) sequences play a central role in the regulation of gene expression. Computational studies on ncRNA have been directed toward developing detection methods for ncRNAs. State-of-the-art methods for the problem, like covarianc ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
Recent studies have uncovered an “RNA world”, in which non coding RNA (ncRNA) sequences play a central role in the regulation of gene expression. Computational studies on ncRNA have been directed toward developing detection methods for ncRNAs. State-of-the-art methods for the problem, like covariance models, suffer from high computational cost, underscoring the need for efficient filtering approaches that can identify promising sequence segments and accelerate the detection process. In this paper we make several contributions toward this goal. First, we formalize the concept of a filter and provide figures of merit that allow comparing between filters. Second, we design efficient sequence based filters that dominate the current state-of-the-art HMM filters. Third, we provide a new formulation of the covariance model that allows speeding up RNA alignment. We demonstrate the power of our approach on both synthetic data and real bacterial genomes. We then apply our algorithm to the detection of novel riboswitch elements from the whole bacterial and archaeal genomes. Our results point to a number of novel riboswitch candidates, and include genomes that were not previously known to contain riboswitches. 1
Fast search of sequences with complex symbol correlations using profile context-sensitive HMMs and pre-screening filters
- Proc. 32nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 2007
"... Recently, profile context-sensitive HMMs (profile-csHMMs) have been proposed which are very effective in modeling the common patterns and motifs in related symbol sequences. Profile-csHMMs are capable of representing long-range correlations between distant symbols, even when these correlations are e ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
(Show Context)
Recently, profile context-sensitive HMMs (profile-csHMMs) have been proposed which are very effective in modeling the common patterns and motifs in related symbol sequences. Profile-csHMMs are capable of representing long-range correlations between distant symbols, even when these correlations are entangled in a complicated manner. This makes profile-csHMMs an useful tool in computational biology, especially in modeling noncoding RNAs (ncR-NAs) and finding new ncRNA genes. However, a profile-csHMM based search is quite slow, hence not practical for searching a large database. In this paper, we propose a practical scheme for making the search speed significantly faster without any degradation in the prediction accuracy. The proposed method utilizes a pre-screening filter based on a profile-HMM, which filters out most sequences that will not be predicted as a match by the original profile-csHMM. Experimental results show that the proposed approach can make the search speed eighty times faster. Index Terms — homology search, profile-csHMM, pseudoknot, noncoding RNA (ncRNA), context-sensitve HMM (csHMM). 1.
DESIGNING SECONDARY STRUCTURE PROFILES FOR FAST NCRNA IDENTIFICATION
, 2008
"... Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP’s sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.
Structural alignment of pseudoknotted RNA
- In Proceedings of the Annual Intl. Conference on Computational Biology (RECOMB
, 2006
"... Abstract. In this paper, we address the problem of discovering novel non-coding RNA (ncRNA) using primary sequence, and secondary structure conservation, focusing on ncRNA families with pseudo-knotted structures. Our main technical result is an efficient algorithm for computing an optimum structural ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we address the problem of discovering novel non-coding RNA (ncRNA) using primary sequence, and secondary structure conservation, focusing on ncRNA families with pseudo-knotted structures. Our main technical result is an efficient algorithm for computing an optimum structural alignment of an RNA sequence against a genomic substring. This algorithm finds two applications. First, by scanning a genome, we can identify novel (homologous) pseudoknotted ncRNA, and second, we can infer the secondary structure of the target aligned sequence. We test an implementation of our algorithm (PAL), and show that it has near-perfect behavior for predicting the structure of many known pseudoknots. Additionally, it can detect the true homologs with high sensitivity and specificity in controlled tests. We also use PAL to search entire viral genome and mouse genome for novel homologs of some viral, and eukaryotic pseudoknots respectively. In each case, we have found strong support for novel homologs. 1
Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering
- of Stem Patterns”, Proc. 41st Asilomar Conference on Signals, Systems, and Computers
, 2007
"... Abstract — Many noncoding RNAs (ncRNAs) have characteristic secondary structures that give rise to complicated base correlations in their primary sequences. Therefore, when performing an RNA similarity search to find new members of a ncRNA family, we need a statistical model – such as the profilecsH ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Many noncoding RNAs (ncRNAs) have characteristic secondary structures that give rise to complicated base correlations in their primary sequences. Therefore, when performing an RNA similarity search to find new members of a ncRNA family, we need a statistical model – such as the profilecsHMM or the covariance model (CM) – that can effectively describe the correlations between distant bases. However, these models are computationally expensive, making the resulting RNA search very slow. To overcome this problem, various prescreening methods have been proposed that first use a simpler model to scan the database and filter out the dissimilar regions. Only the remaining regions that bear some similarity are passed to a more complex model for closer inspection. It has been shown that the prescreening approach can make the search speed significantly faster at no (or a slight) loss of prediction accuracy. In this paper, we propose a novel prescreening method based on matched filtering of stem patterns. Unlike many existing methods, the proposed method can prescreen the database solely based on structural similarity. The proposed method can handle RNAs with arbitrary secondary structures, and it can be easily incorporated into various search methods that use different statistical models. Furthermore, the proposed approach has a low computational cost, yet very effective for prescreening, as will be demonstrated in the paper. I.
Effective Annotation of Noncoding RNA Families Using Profile Context-Sensitive HMMs
"... that function without being translated into proteins. Systematic research on ncRNAs has shown that there exist many ncRNAs that are actively involved in various biological processes, playing key roles in controlling them. As the annotation of ncRNAs is still at an early stage, developing efficient c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
that function without being translated into proteins. Systematic research on ncRNAs has shown that there exist many ncRNAs that are actively involved in various biological processes, playing key roles in controlling them. As the annotation of ncRNAs is still at an early stage, developing efficient computational tools for finding ncRNAs is of great importance. One effective way for finding new ncRNAs is to look for new RNAs that resemble the RNAs that have already been identified. Recently, a new model called the profile context-sensitive HMM (profile-csHMM) has been proposed, and it has been shown that they can provide a convenient framework for finding RNA homologues. In this paper, we give a brief review of profile-csHMMs and their application in RNA similarity search. We also introduce a number of recent advances related to profile-csHMMs and profile-csHMM based search. Index Terms—Profile context-sensitive HMM (profile-csHMM), noncoding RNA (ncRNA), RNA secondary structure, RNA similarity search, pseudoknot. I.
RNA Search with Decision Trees and Partial Covariance Models
"... Abstract—The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone. Index Terms—Bioinformatics, computational intelligence, covariance models, decision trees, RNA database search. Ç 1