Results 1  10
of
156
A LinearTime Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study
 Journal of Computational Biology
, 2001
"... Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to dist ..."
Abstract

Cited by 112 (15 self)
 Add to MetaCart
Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components; then, in the second stage, certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an O(n alpha(n)) algorithm, based on a UnionFind structure, to find its connected components, where a is the inverse Ackerman function. Since for all practical purposes alpha(n) is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new lineartime algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speedup by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.
Within the Twilight Zone: A Sensitive ProfileProfile Comparison Tool Based on Information Theory
 J. Mol. Biol
, 2002
"... This paper presents a novel approach to proleprole comparison. The method compares two input proles (like those that are generated by PSIBLAST) and assigns a similarity score to assess their statistical similarity. Our proleprole comparison tool, which allows for gaps, can be used to detect weak ..."
Abstract

Cited by 99 (4 self)
 Add to MetaCart
This paper presents a novel approach to proleprole comparison. The method compares two input proles (like those that are generated by PSIBLAST) and assigns a similarity score to assess their statistical similarity. Our proleprole comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the proleprole alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is signicantly more sensitive in detecting distant homologies than the popular prolebased search programs PSIBLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSIBLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity
Multiple Genome Rearrangement and Breakpoint Phylogeny
, 1998
"... Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Geneorder sequences diverge through nonlocal genome rearrangement processes such as inversion ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Geneorder sequences diverge through nonlocal genome rearrangement processes such as inversion (or reversal) and transposition. In this paper we show which formulations of multiple alignment have counterparts in multiple rearrangement. Based on di#culties inherent in rearrangement editdistance calculation and interpretation, we argue for the simpler "breakpoint analysis ". Consensusbased multiple rearrangement of N # 3 orders can be solved exactly through reduction to instances of the Travelling Salesman Problem (TSP). We propose a branchandbound solution to TSP particularly suited to these instances. Simulations show how nonuniqueness of the solution is attenuated with increasing numbers of data genomes. Treebased multiple alignment can be achieved to a great degree o...
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
Divideandconquer frontier search applied to optimal sequence alignment
 In National Conference on Artificial Intelligence (AAAI
, 2000
"... We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or aminoacid sequ ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or aminoacid sequences can be solved by nding a lowestcost cornertocorner path in a ddimensional grid. A previous algorithm, called divideandconquer bidirectional search (Korf 1999), saves memory by storing only the Open lists and not the Closed lists. We show that this idea can be applied in a unidirectional search aswell. This extends the technique to problems where bidirectional search is not applicable, and is more e cient in both time and space than the bidirectional version. If n is the length of the strings, and d is the number of strings, this algorithm can reduce the memory requirement from O(n d) to O(n d;1). While our current implementation of DCFS is somewhat slower than existing dynamic programming approaches for optimal alignment of multiple gene sequences, DCFS is a more general algorithm 1
Elimination Methods
, 2000
"... As pointed out by Duarte and Pyle (1), the twodimensional (2D) ηθ plot is a Ramachandranlike diagram that can provide us a graphic representation of quantitatively distinct structural features for analyzing and modeling RNA threedimensional (3D) structures. Particularly, they showed that on this ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
As pointed out by Duarte and Pyle (1), the twodimensional (2D) ηθ plot is a Ramachandranlike diagram that can provide us a graphic representation of quantitatively distinct structural features for analyzing and modeling RNA threedimensional (3D) structures. Particularly, they showed that on this ηθ plot, clusters of nucleotides with similar η and θ pseudotorsional angles have similar conformational properties and vice versa. To depict this ηθ plot, we prepared a dataset that includes nonredundant crystal structures with minimum resolution of 3.0 ˚A from the PDB database (2). This dataset finally contains 117 crystal RNA structures, particularly including 74 structures used by Wadley et al. (3), with 9,527 nucleotides in total. We then used AMIGOS that was developed by Duarte and Pyle (1) to calculate the η and θ pseudotorsion angles for all nonterminal nucleotides (9,267 nt in total) from all RNA molecules in the above dataset and plotted these calculated pseudotorsion angles on the axes of a 2D plot as illustrated in Figure 1. Instead of using the vector quantization (VQ) approach as done in our previous work (4), we here applied the socalled affinity propagation (AP) clustering algorithm, introduced by Frey and Dueck recently (5), to classify all the nonterminal nucleotides in our prepared
New Approximation Techniques for Some Ordering Problems
 IN 9TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1998
"... We describe logarithmic times optimal approximation algorithms for the NPhard graph optimization problems of minimum linear arrangement, minimum containing interval graph, and minimum storagetime product. This improves on the best previous approximation bounds of Even, Naor, Rao, and Schieber for ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We describe logarithmic times optimal approximation algorithms for the NPhard graph optimization problems of minimum linear arrangement, minimum containing interval graph, and minimum storagetime product. This improves on the best previous approximation bounds of Even, Naor, Rao, and Schieber for these problems by an \Omega\Gamma/15 log n) factor. Even, Naor, Rao, and Schieber defined "spreading metrics" for each of the ordering problems above (and to other problems); for each of these problems, they provided a spreading metric of volume W , such that W is a lower bound on the cost of a solution to the problem. They used this spreading metric to find a solution of cost O(W log n log log n) (for simplicity, assume that all tasks have unit processing time in the minimum storagetime product problem). In this paper, we show how to find a solution within a logarithmic factor times W for these problems. We develop a recursion where at each level we identify cost which, if incurred, yi...
Finding an optimal inversion median: experimental results
 In Proc. 1st Workshop on Algs. in Bioinformatics WABI 2001
, 2001
"... Abstract. We derive a branchandbound algorithm to find an optimal inversion median of three signed permutations. The algorithm prunes to manageable size an extremely large search tree using simple geometric properties of the problem and a newly available lineartime routine for inversion distance. ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
Abstract. We derive a branchandbound algorithm to find an optimal inversion median of three signed permutations. The algorithm prunes to manageable size an extremely large search tree using simple geometric properties of the problem and a newly available lineartime routine for inversion distance. Our experiments on simulated data sets indicate that the algorithm finds optimal medians in reasonable time for genomes of medium size when distances are not too large, as commonly occurs in phylogeny reconstruction. In addition, we have compared inversion and breakpoint medians, and found that inversion medians generally score significantly better and tend to be far more unique, which should make them valuable in medianbased treebuilding algorithms. 1