Results 1 - 10
of
130
A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study
- Journal of Computational Biology
, 2001
"... Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to dist ..."
Abstract
-
Cited by 99 (15 self)
- Add to MetaCart
Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components; then, in the second stage, certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an O(n alpha(n)) algorithm, based on a Union-Find structure, to find its connected components, where a is the inverse Ackerman function. Since for all practical purposes alpha(n) is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new linear-time algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speed-up by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.
Multiple Genome Rearrangement and Breakpoint Phylogeny
, 1998
"... Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Gene-order sequences diverge through non-local genome rearrangement processes such as inversion ..."
Abstract
-
Cited by 69 (9 self)
- Add to MetaCart
Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Gene-order sequences diverge through non-local genome rearrangement processes such as inversion (or reversal) and transposition. In this paper we show which formulations of multiple alignment have counterparts in multiple rearrangement. Based on di#culties inherent in rearrangement edit-distance calculation and interpretation, we argue for the simpler "breakpoint analysis ". Consensus-based multiple rearrangement of N # 3 orders can be solved exactly through reduction to instances of the Travelling Salesman Problem (TSP). We propose a branch-and-bound solution to TSP particularly suited to these instances. Simulations show how non-uniqueness of the solution is attenuated with increasing numbers of data genomes. Treebased multiple alignment can be achieved to a great degree o...
An Algorithm for Approximate Tandem Repeats
- In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
Within the Twilight Zone: A Sensitive Profile-Profile Comparison Tool Based on Information Theory
- J. Mol. Biol
, 2002
"... This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the prole-prole alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is signicantly more sensitive in detecting distant homologies than the popular prole-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity
Divide-and-conquer frontier search applied to optimal sequence alignment
- In National Conference on Artificial Intelligence (AAAI
, 2000
"... We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or amino-acid sequ ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or amino-acid sequences can be solved by nding a lowest-cost corner-to-corner path in a d-dimensional grid. A previous algorithm, called divide-and-conquer bidirectional search (Korf 1999), saves memory by storing only the Open lists and not the Closed lists. We show that this idea can be applied in a unidirectional search aswell. This extends the technique to problems where bidirectional search is not applicable, and is more e cient in both time and space than the bidirectional version. If n is the length of the strings, and d is the number of strings, this algorithm can reduce the memory requirement from O(n d) to O(n d;1). While our current implementation of DCFS is somewhat slower than existing dynamic programming approaches for optimal alignment of multiple gene sequences, DCFS is a more general algorithm 1
New Approximation Techniques for Some Ordering Problems
- IN 9TH ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1998
"... We describe logarithmic times optimal approximation algorithms for the NP-hard graph optimization problems of minimum linear arrangement, minimum containing interval graph, and minimum storage--time product. This improves on the best previous approximation bounds of Even, Naor, Rao, and Schieber for ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
We describe logarithmic times optimal approximation algorithms for the NP-hard graph optimization problems of minimum linear arrangement, minimum containing interval graph, and minimum storage--time product. This improves on the best previous approximation bounds of Even, Naor, Rao, and Schieber for these problems by an \Omega\Gamma/15 log n) factor. Even, Naor, Rao, and Schieber defined "spreading metrics" for each of the ordering problems above (and to other problems); for each of these problems, they provided a spreading metric of volume W , such that W is a lower bound on the cost of a solution to the problem. They used this spreading metric to find a solution of cost O(W log n log log n) (for simplicity, assume that all tasks have unit processing time in the minimum storage--time product problem). In this paper, we show how to find a solution within a logarithmic factor times W for these problems. We develop a recursion where at each level we identify cost which, if incurred, yi...
Finding an optimal inversion median: experimental results
- In Proc. 1st Workshop on Algs. in Bioinformatics WABI 2001
, 2001
"... Abstract. We derive a branch-and-bound algorithm to find an optimal inversion median of three signed permutations. The algorithm prunes to manageable size an extremely large search tree using simple geometric properties of the problem and a newly available linear-time routine for inversion distance. ..."
Abstract
-
Cited by 23 (10 self)
- Add to MetaCart
Abstract. We derive a branch-and-bound algorithm to find an optimal inversion median of three signed permutations. The algorithm prunes to manageable size an extremely large search tree using simple geometric properties of the problem and a newly available linear-time routine for inversion distance. Our experiments on simulated data sets indicate that the algorithm finds optimal medians in reasonable time for genomes of medium size when distances are not too large, as commonly occurs in phylogeny reconstruction. In addition, we have compared inversion and breakpoint medians, and found that inversion medians generally score significantly better and tend to be far more unique, which should make them valuable in median-based tree-building algorithms. 1
Parallel Dynamic Programming for Solving the String Editing Problem on a CGM/BSP
, 2002
"... In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic programming technique for computing all highest scoring paths in a weighted grid graph. T ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic programming technique for computing all highest scoring paths in a weighted grid graph. The algorithm requires log p rounds/supersteps and O( p log m) local computation, where p is the number of processors, p n. To our knowledge, this is the first efficient CGM/BSP algorithm for the alignment of all substrings of C with A. Furthermore, the CGM/BSP parallel dynamic programming technique presented is of interest in its own right and we expect it to lead to other parallel dynamic programming methods for the CGM/BSP.

