Results 1 - 10
of
35
Improvement in the Accuracy of Multiple Sequence Alignment Program MAFFT
"... In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to ∼5,000 sequences) and long data (∼2,000 aa or ∼5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to ∼5,000 sequences) and long data (∼2,000 aa or ∼5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) of MAFFT were outperformed by ProbCons and TCoffee v.2, both of which were released in 2004, in several benchmark tests. Here we report a recent extension of MAFFT that aims to improve the accuracy with as little cost of calculation time as possible. The extended version of MAFFT (v.5) has new iterative refinement options, G-INS-i and L-INS-i (collectively denoted as [GL]-INS-i in this report). These options use a new objective function combining the weighted sum-of-pairs (WSP) score and a score similar to COFFEE derived from all pairwise alignments. We discuss the improvement in accuracy brought by this extension, mainly using two benchmark tests released very recently, BAliBASE v.3 (for protein alignments) and BRAliBASE (for RNA alignments). According to BAliBASE v.3, the overall average accuracy of L-INS-i was higher than those of other methods successively released in 2004, although the difference among the most accurate methods (ProbCons, TCoffee v.2 and new options of MAFFT) was small. The advantage
Parallel multiple sequence alignment with decentralized cache support
- In Proc. of Euro-Par ’05
, 2005
"... Abstract. In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added or when alignments of different precisions are requested. Our method can be used to solve important biological problems like the adaptive update of a complete evolution tree when new sequences are added (without recomputing the whole tree). To validate the method, some experiments were performed using up to 512 Small Subunit Ribosomal RNA sequences, which were analyzed with different levels of precision. 1
A max-margin model for efficient simultaneous alignment and folding of RNA sequences
, 2008
"... ..."
Comparative analysis of RNA genes: the caRNAc software
"... RNA genes are ubiquitous in the cell and are involved in a number of biochemical processes. Because there is a close relationship between function and structure, software tools that predict the secondary structure of noncoding RNAs from the base sequence are very helpful. In this article, we focus o ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
RNA genes are ubiquitous in the cell and are involved in a number of biochemical processes. Because there is a close relationship between function and structure, software tools that predict the secondary structure of noncoding RNAs from the base sequence are very helpful. In this article, we focus our attention on the inference of conserved secondary structure for a group of homologous RNA sequences. We present the caRNAc software which enables the analysis of families of homologous sequences without prior alignment. The method relies both on comparative analysis and thermodynamic information.
Accurate Multiple Sequence-Structure Alignment of RNA Sequences Using Combinatorial Optimization
, 2007
"... Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusions: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input
MAGNOLIA: multiple alignment of protein–coding and structural RNA sequences
- NUCLEIC ACIDS RESEARCH
, 2008
"... MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designed for sequences that are intended to be either proteincoding or structural RNAs. It extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern between sequences before aligning them. The alignment step then incorporates this information to achieve higher accuracy. The website is available at
Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?
- BMC BIOINFORMATICS
, 2007
"... ..."
Comparative genomics beyond sequence based alignments: RNA structures in the ENCODE regions
"... 1 Recent computational scans for noncoding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure—frequent compensating base changes—is increasingly likely to cause sequence-based alignment method ..."
Abstract
- Add to MetaCart
1 Recent computational scans for noncoding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure—frequent compensating base changes—is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6,587 candidate regions with an estimated false positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one quarter of our predicted motifs show revisions in more than 50 % of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequencealignment-based approaches—84 % of our candidates are not covered by Washietl et al., increasing

