## Near Optimal Multiple Alignment Within a Band In Polynomial Time (2000)

Venue: | In Proc. of 32nd ACM STOC |

Citations: | 8 - 0 self |

### BibTeX

@INPROCEEDINGS{Li00nearoptimal,

author = {Ming Li and Bin Ma and Lusheng Wang},

title = {Near Optimal Multiple Alignment Within a Band In Polynomial Time},

booktitle = {In Proc. of 32nd ACM STOC},

year = {2000},

pages = {425--434},

publisher = {Cambridge University Press}

}

### OpenURL

### Abstract

Multiple sequence alignment is one of the most important problems in computational biology. Because of its notorious difficulties, aligning sequences within a constant band is a popular practice in bioinformatics with good results [17; 13; 14; 15; 1; 3; 6; 20; 18]. However, the problem is still NP-hard for multiple sequences. In this paper, we present polynomial time approximation schemes (PTAS) for multiple sequence alignment within a constant band, under standard models of SP alignment and consensus (star) alignment. The algorithms work for very general score schemes. In order to prove our main results, we also present a PTAS for SP alignment and a PTAS for consensus alignment, allowing only constant number of insertion and deletion gaps (of arbitrary length) per sequence on the average. 1.

### Citations

1870 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ... a 2 \Sigma, it is easy to see that h r (a) is the sum of r independently poisson trial with success probability pa = h k (a) k . Thus, E[h r (a)] = pa r = r k h k (a). Therefore, by Chernoff's bound =-=[10]-=-, for any 0 ! ffls1, Pr i r k h k (a) \Gamma h r (a) ? fflr jsexp(\Gamma ffl 2 r 2pa )sexp(\Gamma ffl 2 r 2 ): (20) Moreover, it is easy to see that r k h k (a) \Gamma h r (a)sr. Combining with Formul... |

1047 |
Improved tools for biological sequence comparison
- Pearson, Lipman
- 1988
(Show Context)
Citation Context ...one of the most important problems in computational biology. Because of its notorious difficulties, aligning sequences within a constant band is a popular practice in bioinformatics with good results =-=[17; 13; 14; 15; 1; 3; 6; 20; 18]-=-. However, the problem is still NP-hard for multiple sequences. In this paper, we present polynomial time approximation schemes (PTAS) for multiple sequence alignment within a constant band, under sta... |

901 |
Algorithms on Strings, Trees and Sequences
- Gusfield
- 1998
(Show Context)
Citation Context ...deletion gaps (of arbitrary length) per sequence on the average. 1. INTRODUCTION Multiple sequence alignment is one of the fundamental and most challenging problems in computational molecular biology =-=[1; 17; 4; 5; 24]-=-. It plays an essential role in finding similarity and highly conserved subregions among a set of biological sequences. Many objective functions have been proposed for measuring the quality of the ali... |

571 |
Optimization, approximation and complexity classes
- Papadimitriou, Yannakakis
- 1988
(Show Context)
Citation Context ...etween the letters a and b. We reduce Maximum Cut-3 to 0-Gap SP Alignment. Maximum Cut-3 asks for a maximum cut of a graph G whose every node has degree no more than 3. It is known to be Max SNP-hard =-=[12]-=- and hence NP-hard. Let G =! V; E ? be an instance of Maximum Cut-3, where G is an undirected graph, V = fv1 ; v2 ; : : : ; vng and E = fe1 ; e2 ; : : : ; emg. Let the alphabet \Sigma = f0; 1; xg. We ... |

382 |
Time Warps, String Edits and Macromolecules: The Theory and Practice ofSequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ...one of the most important problems in computational biology. Because of its notorious difficulties, aligning sequences within a constant band is a popular practice in bioinformatics with good results =-=[17; 13; 14; 15; 1; 3; 6; 20; 18]-=-. However, the problem is still NP-hard for multiple sequences. In this paper, we present polynomial time approximation schemes (PTAS) for multiple sequence alignment within a constant band, under sta... |

265 |
Introduction to computational biology
- Waterman
- 2000
(Show Context)
Citation Context ...deletion gaps (of arbitrary length) per sequence on the average. 1. INTRODUCTION Multiple sequence alignment is one of the fundamental and most challenging problems in computational molecular biology =-=[1; 17; 4; 5; 24]-=-. It plays an essential role in finding similarity and highly conserved subregions among a set of biological sequences. Many objective functions have been proposed for measuring the quality of the ali... |

231 | On the complexity of multiple sequence alignment
- Wang, Jiang
- 1994
(Show Context)
Citation Context ... Sn and is called the median sequence. Consensus Alignment is NP-hard for a score scheme where a mismatch costs 1 and a match costs 0 [9]. The problem is MAX SNP-hard if the score scheme is arbitrary =-=[21]-=-. The best known approximation algorithm for Consensus Alignment has performance ratio 2 \Gamma o(1) [5]. SP alignmentshas been extensively studied recently. With much effort, the best known performan... |

187 |
Algorithms for approximate string matching
- Ukkonen
- 1985
(Show Context)
Citation Context ...one of the most important problems in computational biology. Because of its notorious difficulties, aligning sequences within a constant band is a popular practice in bioinformatics with good results =-=[17; 13; 14; 15; 1; 3; 6; 20; 18]-=-. However, the problem is still NP-hard for multiple sequences. In this paper, we present polynomial time approximation schemes (PTAS) for multiple sequence alignment within a constant band, under sta... |

132 |
A Tool for Multiple Sequence Alignment
- Lipman, Altschul, et al.
- 1989
(Show Context)
Citation Context |

108 |
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics
- Pearson
- 1991
(Show Context)
Citation Context |

101 |
Efficient methods for multiple sequence alignment with guaranteed error
- Gusfield
- 1993
(Show Context)
Citation Context ...deletion gaps (of arbitrary length) per sequence on the average. 1. INTRODUCTION Multiple sequence alignment is one of the fundamental and most challenging problems in computational molecular biology =-=[1; 17; 4; 5; 24]-=-. It plays an essential role in finding similarity and highly conserved subregions among a set of biological sequences. Many objective functions have been proposed for measuring the quality of the ali... |

52 |
Approximation algorithms for multiple sequence alignment
- Bafna, Lawler, et al.
- 1994
(Show Context)
Citation Context ...ly studied recently. With much effort, the best known performance ratio for SP alignment has been improved from 2 \Gamma 2 k to 2 \Gamma l k for any constant l, where k is the number of the sequences =-=[4; 16; 2]-=-. The 2 \Gamma o(1) barrier appears to be formidable. There is an enormous literature and various other models, methods, and heuristics on multiple sequence alignment for which we refer the reader to ... |

44 |
Approximation algorithms for tree alignment with a given phylogeny, Algorithmica 16
- Wang, Jiang, et al.
- 1996
(Show Context)
Citation Context ...equality [17]. To our knowledge, all proposed approximation algorithms with guaranteed performance ratios either explicitly or implicitly assume that the score schemes satisfy the triangle inequality =-=[4; 16; 2; 22; 23; 25]-=-. In [25], score schemes do not have to satisfy triangle inequality. However, since arbitrary number of intermediate sequences (nodes) are allowed to be added between any two sequences assigned to the... |

42 | A Polynomial Time Approximation Scheme for Minimum Routing Cost Spanning Trees
- Wu, Lancia, et al.
- 1998
(Show Context)
Citation Context ...equality [17]. To our knowledge, all proposed approximation algorithms with guaranteed performance ratios either explicitly or implicitly assume that the score schemes satisfy the triangle inequality =-=[4; 16; 2; 22; 23; 25]-=-. In [25], score schemes do not have to satisfy triangle inequality. However, since arbitrary number of intermediate sequences (nodes) are allowed to be added between any two sequences assigned to the... |

41 |
Improved approximation algorithms for tree alignment
- Wang, Gusfield
(Show Context)
Citation Context ...equality [17]. To our knowledge, all proposed approximation algorithms with guaranteed performance ratios either explicitly or implicitly assume that the score schemes satisfy the triangle inequality =-=[4; 16; 2; 22; 23; 25]-=-. In [25], score schemes do not have to satisfy triangle inequality. However, since arbitrary number of intermediate sequences (nodes) are allowed to be added between any two sequences assigned to the... |

32 | Multiple alignment, communication cost, and graph matching
- Pevzner
- 1992
(Show Context)
Citation Context ...ly studied recently. With much effort, the best known performance ratio for SP alignment has been improved from 2 \Gamma 2 k to 2 \Gamma l k for any constant l, where k is the number of the sequences =-=[4; 16; 2]-=-. The 2 \Gamma o(1) barrier appears to be formidable. There is an enormous literature and various other models, methods, and heuristics on multiple sequence alignment for which we refer the reader to ... |

28 | Finding similar regions in many sequences
- Li, Ma, et al.
- 1999
(Show Context)
Citation Context ...H (S; S i ), where S is the majority sequence of S1 ; S2 ; : : : ; Sn and is called the median sequence. Consensus Alignment is NP-hard for a score scheme where a mismatch costs 1 and a match costs 0 =-=[9]-=-. The problem is MAX SNP-hard if the score scheme is arbitrary [21]. The best known approximation algorithm for Consensus Alignment has performance ratio 2 \Gamma o(1) [5]. SP alignmentshas been exten... |

26 |
Aligning two sequences within a specified diagonal band
- Chao, Pearson, et al.
- 1992
(Show Context)
Citation Context |

21 | A polyhedral approach to sequence alignment problems
- Kececioglu, Lenhof, et al.
(Show Context)
Citation Context ...a o(1) barrier appears to be formidable. There is an enormous literature and various other models, methods, and heuristics on multiple sequence alignment for which we refer the reader to [5; 24], and =-=[8]-=- for some recent developments. In this paper, we are interested in theoretically resolving a popular special case of the problem: multiple alignment within a band. The restriction of aligning within c... |

19 | Progressive multiple alignment with constraints - Myers, Selznick, et al. - 1997 |

18 |
A Course in Probability and Statistics
- Stone
- 1995
(Show Context)
Citation Context ...the sum of r independent 0-1 random variables, each taking 1 with probability x 00 j;a n , i.e.,sj;a satisfies a binomial distribution B(r; x 00 j;a n ). By a simple property of binomial distribution =-=[19]-=-, var( j;a ) = E ( j;a \Gamma r \Theta x 00 j;a n ) 2 = r \Theta x 00 j;a n \Theta ` 1 \Gamma x 00 j;a n ' (4) Multiplying Formula (4) by ( n r ) 2 , we get E \Theta (~x j;a \Gamma x 00 j;a ) 2s= 1 r ... |

17 | Fast optimal alignment - Fickett - 1984 |

7 | Fast optimal alignment - Spouge - 1991 |

3 |
On the computational complexity of gap-0 multiple alignment, manuscript
- Just
- 1998
(Show Context)
Citation Context ...ULTS Theorem 1. 0-Gap SP Alignment is NP-hard. Proof. (Sketch) W. Just has recently proved that 0-Gap SP Alignment is NP-hard for the case where the possible gaps are at the two ends of the sequences =-=[7]-=-. The score scheme he used satisfies triangle inequality. However, his result does not imply the NP-hardness for the w a;b 2 f0; 1g score scheme, where wa;b is the score between the letters a and b. W... |

1 |
Rapid and sensitivecomparison with FASTP
- Pearson
- 1990
(Show Context)
Citation Context |