## A polyhedral approach to sequence alignment problems (2000)

Venue: | DISCRETE APPL. MATH |

Citations: | 21 - 1 self |

### BibTeX

@ARTICLE{Kececioglu00apolyhedral,

author = {John D. Kececioglu and Hans-peter Lenhof and Kurt Mehlhorn and Petra Mutzel and Knut Reinert and Martin Vingron},

title = {A polyhedral approach to sequence alignment problems},

journal = {DISCRETE APPL. MATH},

year = {2000},

pages = {143--186}

}

### Years of Citing Articles

### OpenURL

### Abstract

We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branch-and-cut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facet-defining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branch-and-cut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.

### Citations

1601 |
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
- Needleman, Wunsch
- 1970
(Show Context)
Citation Context ...e sequences in order to exhibit their commonalities. It is interesting that while the diversity of alignment problems and their associated algorithms has grown tremendously since Needleman and Wunsch =-=[28]-=- first published their paper on two-sequence alignment in 1970, most alignment problems that have been studied have been solved by dynamic programming. This technique, while quite powerful, has the dr... |

859 |
Amino acid substitution matrices from protein blocks
- Henikoff, Henikoff
- 1992
(Show Context)
Citation Context ...es 8 edges (and therefore 8 singleton sets). - Most commonly-used scoring schemes are based on the similarity of single pairs of characters (for instance Dayhoff et al. [5] or Hennikoff and Hennikoff =-=[14]-=-). This corresponds to a partition of the edges into singleton sets as in Figure 2 and is equivalent to the original MT formulation. It is worth noting that the singleton case includes as a special ca... |

421 |
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ... differ only in their arrangement of unaligned regions. The notion of a trace of two strings as illustrated in Figure 1 is a basic concept in sequence comparison (see for instance Sankoff and Kruskal =-=[36]-=- pp. 10-18) which Kececioglu [19] generalized to multiple sequence alignment with the notion of a trace of an alignment graph. The relationship between multiple alignment and multipartite graphs was a... |

396 |
A model of evolutionary change in proteins
- Dayhoff, Schwarz, et al.
- 1978
(Show Context)
Citation Context ...ght is an alignment that realizes 8 edges (and therefore 8 singleton sets). - Most commonly-used scoring schemes are based on the similarity of single pairs of characters (for instance Dayhoff et al. =-=[5]-=- or Hennikoff and Hennikoff [14]). This corresponds to a partition of the edges into singleton sets as in Figure 2 and is equivalent to the original MT formulation. It is worth noting that the singlet... |

386 |
The ellipsoid method and its consequences in combinatorial optimization
- Grötschel, Lovász, et al.
- 1981
(Show Context)
Citation Context ...of inequalities in the complete description of the MT polytope is exponential, we will show how to solve the separation problem for it in polynomial time. According to Grötschel, Lovász and Schrijver =-=[10]-=- this implies that the associated relaxed optimization problem can be solved in polynomial time. This equivalence of optimization and separation leads to a polynomial time algorithm for the MT problem... |

273 | RNA sequence analysis using covariance models
- Eddy, Durbin, et al.
- 1994
(Show Context)
Citation Context ...programming algorithm to find motifs among many RNA sequences.” Instead of using dynamic programming the algorithm of Waterman [40] searches for common motifs among several sequences. Eddy and Durbin =-=[7]-=- de5scribe probabilistic models for measuring the secondary structure and primary sequence consensus of RNA sequence families. They present algorithms for analyzing and comparing RNA sequences as wel... |

242 |
Simultaneous solution of the rna folding, alignment and protosequence problems
- Sankoff
- 1985
(Show Context)
Citation Context ...sequences can yield convincing evidence for how an RNA molecule folds. The computational problem of considering sequence and structure of an RNA molecule simultaneously was first addressed by Sankoff =-=[35]-=- who proposed a dynamic programming algorithm that aligns a set of RNA sequences while at the same time predicting their common fold. Algorithms similar in spirit were proposed later on for the proble... |

136 |
Theory of linear and integer programming. Wiley-Interscience series in discrete mathematics and optimization
- Schrijver
- 1998
(Show Context)
Citation Context ...is exponential in the number of sequences in the input. We study a new approach to solving sequence alignment problems based on an area of combinatorial optimization known as polyhedral combinatorics =-=[37,29]-=-. We demonstrate how this approach when applied to the Generalized Maximum Trace (GMT) and RNA Sequence Alignment (RSA) problems yields an algorithm for each problem that is not based on dynamic progr... |

127 |
The Traveling Salesman Problem
- Jünger, Rinaldi
- 1997
(Show Context)
Citation Context ...s combine linear programming with the branch-and-bound paradigm, and are currently the most successful algorithms for solving hard combinatorial problems such as the famous Traveling Salesman Problem =-=[15,2]-=-. We view as one of the contributions of our work the introduction of the polyhedral approach to the area of sequence alignment, and our experience with these relatively new techniques has helped us t... |

123 | Multiple DNA and protein sequence alignment based on segment-to-segment comparison - Morgenstern, Dress, et al. - 1996 |

110 | Finding the most significant common sequence and structure motifs in a set of RNA sequences
- Gorodkin, Heyer, et al.
- 1997
(Show Context)
Citation Context ...gorithm can then be applied to. Bafna et al. [3] improved the dynamic programming algorithm to a running time of O(n 4 ) which still does not make it applicable to real-life problems. Gorodkin et al. =-=[8]-=- iterate Sankoff’s dynamic programming algorithm to find motifs among many RNA sequences.” Instead of using dynamic programming the algorithm of Waterman [40] searches for common motifs among several ... |

105 |
A new algorithm for generating all the maximal independent sets
- Tsukiyama, Ide, et al.
- 1977
(Show Context)
Citation Context ...his situation: (1) If the RSA graph is not too dense, we enumerate all maximal extended 34cliques containing i (or j). With the use of bit vectors and an adaption of an algorithm by Tsukiyama et al. =-=[38]-=- this can be done in reasonable time for up to 20000-30000 cliques. This cuts off the current infeasible solution and considerably shrinks the enumeration tree in the branching phase. (2) If the RSA g... |

89 |
A cutting plane algorithm for the linear ordering problem
- Grotschel, Junger, et al.
- 1984
(Show Context)
Citation Context ...≤ |E ∪ B| − 2. Thus x(C ∩ E) ≤ ℓ − 1 is not a facet-defining inequality. ✷ 4 The branch-and-cut algorithms Branch-and-cut algorithms have been first applied successfully to the linearordering problem =-=[9]-=-, and then for the traveling-salesman problem [32]. In the meantime they are applied in many fields of Operations Research and the 30Natural Sciences. This is the first time that branch-and-cut algor... |

66 | Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment
- Gupta
- 1995
(Show Context)
Citation Context ...ow that we are able to solve problem instances to optimality, the size of which is not tractable for dynamic programming based approaches. Sophisticated implementations of such approaches such as MSA =-=[12]-=- or GSA [22] cannot possibly solve non-trivial problem instances of 18 sequences. This is due to the exponential space consumption of dynamic programming. Although both programs can compute an alignme... |

64 |
Optimization of a 532-City Symmetric Traveling Salesman Problem by Branch-and-Cut
- Padberg, Rinaldi
- 1987
(Show Context)
Citation Context ...et-defining inequality. ✷ 4 The branch-and-cut algorithms Branch-and-cut algorithms have been first applied successfully to the linearordering problem [9], and then for the traveling-salesman problem =-=[32]-=-. In the meantime they are applied in many fields of Operations Research and the 30Natural Sciences. This is the first time that branch-and-cut algorithms are used in the field of Computational Molec... |

63 |
Comparitive analysis of multiple protein-sequence alignment methods
- McClure, Vasi, et al.
- 1994
(Show Context)
Citation Context ...probability that a random diagonal of the same length ld has at least the md matches. Then wd is defined to be − log P ′ (ld, md). As input sequences we used a subset of the dataset of McClure et al. =-=[24]-=- and two samples of 15 respectively 18 prion proteins from the SWISSPROT database. All tests were conducted on a single processor of a Sun Enterprise 10000. The prion dataset consists of relatively si... |

56 |
A Platform for Combinatorial and Geometric Computing
- LEDA
- 1999
(Show Context)
Citation Context ...ubproblems. 5 Computational results In this section we report on the results generated by our program. The implementation is coded in C++ using the library of efficient data types and algorithms LEDA =-=[25]-=- and the branch-and-cut framework ABACUS [17]. 5.1 The GMT problem We tested three different ways to generate the extended alignment graph. • As an example of a scoring scheme based on the comparison ... |

49 | Database on the structure of small ribosomal subunit RNA
- Peer, Nicolai, et al.
- 1996
(Show Context)
Citation Context ...equences. For the cases studied the algorithm reproduced the correct alignments. Here we want to present some more challenging examples of 23S ribosomal RNA sequences from the Antwerpen rRNA database =-=[6]-=-. The base pairs for the first sequence were taken from the common secondary structure given in the database. We present results for three sequences taken from the database: (1) Desulfurococcus Mobili... |

44 |
On linear characterization of combinatorial optimization problems
- BERTSIMAS, KARP, et al.
- 1982
(Show Context)
Citation Context ... optimization problem over P can be solved via linear programming. However, it is unlikely to find a complete description of a polytope associated 12with a NP–hard combinatorial optimization problem =-=[18]-=-. But experience has shown that partial descriptions already suffice in order to solve given instances to provable optimality. Concerning the MT polytope for two sequences, we succeeded in finding the... |

41 |
Properties of vertex packing and independence system polyhedra
- Nemhauser, Trotter
- 1974
(Show Context)
Citation Context ...0 > 0. The above theorem for full-dimensional polytopes restricts the number of possible facet-defining inequalities. The next theorem will prove useful in the two-sequence case of the GMT. Theorem 5 =-=[30]-=- Suppose F ⊆ A is a maximal clique in the k-regular independence system (A, I). Then ∑ e∈F xe ≤ k − 1 is a facet of PI. For F ⊆ A we call I ′ = (F, I ′ ) where I ′ = {C ∈ I| C ⊆ F }, the subsystem gen... |

38 | Computing similarity between RNA strings
- Bafna, Muthukrishnan, et al.
- 1995
(Show Context)
Citation Context ...s to the basic linear program. With dynamic programming, on the other hand, accommodating variations such as considering secondary structure in sequence alignment, as in Bafna, Muthukrishnan and Ravi =-=[3]-=-, can cause at a minimum a significant restructuring of the basic recurrences. With the polyhedral approach, much of the code developed for the basic problem can be reused for the problem variations; ... |

38 | A polyhedral approach to RNA sequence structure alignment
- Lenhof, Reinert, et al.
- 1998
(Show Context)
Citation Context ...hm is a set of reasonable alignment edges. In principle, we use for this purpose alignment edges realized by some suboptimal alignment, i.e. an alignment with a score close to optimal. In contrast to =-=[21]-=- we do not take all the edges realized by any suboptimal alignments scoring better than a fixed threshold s below the optimal. Rather we employ a windowing technique to make the alignment graph denser... |

30 |
RNAlign program: alignent of RNA sequences using both primary and secondary structures
- Corpet, Michot
- 1994
(Show Context)
Citation Context ...e at the same time predicting their common fold. Algorithms similar in spirit were proposed later on for the problem of comparing one RNA sequence to one or more of known structure. Corpet and Michot =-=[4]-=- align simultaneously a sequence with a number of other sequences using both primary and secondary structure. Their dynamic programming algorithm requires O(n 5 ) running time and O(n 4 ) space (n is ... |

30 | RAGA: RNA sequence alignment by genetic algorithm
- Notredame, O’Brien, et al.
- 1997
(Show Context)
Citation Context ...arch techniques. Since the basic operation in their approach is an expensive dynamic programming procedure, their algorithms cannot analyze sequences longer than 150–200 nucleotides. Notredame et al. =-=[31]-=- implemented a genetic algorithm for the optimization of both alignment and structure correspondence between two RNA molecules. Their procedure produces biologically good results although at the expen... |

29 |
Exact and approximation algorithms for DNA sequence reconstruction
- Kececioglu
- 1991
(Show Context)
Citation Context ...d newly-discovered classes. 2Graphs, traces and multiple alignment To describe the GMT and RSA problem we first review a formulation of multiple alignment in terms of graphs introduced by Kececioglu =-=[19]-=- and show how to extend this formulation to model the two new problems. C C G e f g h G C G (a) U - - C f G C C - (b) G h G U Fig. 1. (a) An alignment graph of two sequences CGC and GCGU. Edges e, f a... |

24 |
Detailed molecular model for transfer ribonucleic acid
- Levitt
- 1969
(Show Context)
Citation Context ...that the known structure carries over to the unknown sequences and as much sequence similarity is maintained as possible. With the prediction of tRNA structure from a set of similar sequences, Levitt =-=[23]-=- had strikingly demonstrated that sets of similar sequences can yield convincing evidence for how an RNA molecule folds. The computational problem of considering sequence and structure of an RNA molec... |

24 |
The context-dependent comparison of biological sequences
- Wilbur, Lipman
- 1984
(Show Context)
Citation Context ...ures more general scoring schemes based on the similarity of pairs of whole segments of the sequences pairs (see for instance Altschul and Erickson [1], Morgenstern et al. [26], and Wilbur and Lipman =-=[41]-=-). To illustrate how this can be done, Figure 3 shows a partition into sets of edges that form consecutive runs of matches. Here the edges of a run from a block. 4Note that the two blocks d5 and d1 e... |

22 | Practical problem solving with cutting plane algorithms in combinatorial optimization
- Jünger, Reinelt, et al.
- 1995
(Show Context)
Citation Context ... the Generalized Maximum Trace (GMT) and RNA Sequence Alignment (RSA) problems yields an algorithm for each problem that is not based on dynamic programming but is known as a branch-and-cut algorithm =-=[16]-=-. Branch-and-cut algorithms combine linear programming with the branch-and-bound paradigm, and are currently the most successful algorithms for solving hard combinatorial problems such as the famous T... |

19 | The practical use of the A* algorithm for exact multiple sequence alignment
- Reinert, Lermen
(Show Context)
Citation Context ...re able to solve problem instances to optimality, the size of which is not tractable for dynamic programming based approaches. Sophisticated implementations of such approaches such as MSA [12] or GSA =-=[22]-=- cannot possibly solve non-trivial problem instances of 18 sequences. This is due to the exponential space consumption of dynamic programming. Although both programs can compute an alignment of guaran... |

16 |
Locally optimal subalignments using nonlinear similarity functions. Bulletin of Mathematical Biology 48:633–660
- Altschul, Erickson
- 1986
(Show Context)
Citation Context ...m-of-pairs multiple alignment problem. GMT also captures more general scoring schemes based on the similarity of pairs of whole segments of the sequences pairs (see for instance Altschul and Erickson =-=[1]-=-, Morgenstern et al. [26], and Wilbur and Lipman [41]). To illustrate how this can be done, Figure 3 shows a partition into sets of edges that form consecutive runs of matches. Here the edges of a run... |

12 |
Consensus methods for folding single-stranded nucleic acids
- Waterman
- 1989
(Show Context)
Citation Context ...ble to real-life problems. Gorodkin et al. [8] iterate Sankoff’s dynamic programming algorithm to find motifs among many RNA sequences.” Instead of using dynamic programming the algorithm of Waterman =-=[40]-=- searches for common motifs among several sequences. Eddy and Durbin [7] de5scribe probabilistic models for measuring the secondary structure and primary sequence consensus of RNA sequence families. ... |

6 |
Finding cuts
- Applegate, Bixby, et al.
- 1995
(Show Context)
Citation Context ...s combine linear programming with the branch-and-bound paradigm, and are currently the most successful algorithms for solving hard combinatorial problems such as the famous Traveling Salesman Problem =-=[15,2]-=-. We view as one of the contributions of our work the introduction of the polyhedral approach to the area of sequence alignment, and our experience with these relatively new techniques has helped us t... |

6 |
The design of the branch and cut system ABACUS
- Jünger, Thienel
- 1997
(Show Context)
Citation Context ...ection we report on the results generated by our program. The implementation is coded in C++ using the library of efficient data types and algorithms LEDA [25] and the branch-and-cut framework ABACUS =-=[17]-=-. 5.1 The GMT problem We tested three different ways to generate the extended alignment graph. • As an example of a scoring scheme based on the comparison of two residues (MT) we adapted the PRIMAL [2... |

5 |
Dress A: Segment-based scores for pairwise and multiple sequence alignments
- Morgenstern, Atchley, et al.
- 1998
(Show Context)
Citation Context ...nment problem. GMT also captures more general scoring schemes based on the similarity of pairs of whole segments of the sequences pairs (see for instance Altschul and Erickson [1], Morgenstern et al. =-=[26]-=-, and Wilbur and Lipman [41]). To illustrate how this can be done, Figure 3 shows a partition into sets of edges that form consecutive runs of matches. Here the edges of a run from a block. 4Note tha... |

5 | A Polyhedral Approach to Sequence Alignment Problems
- Reinert
- 1999
(Show Context)
Citation Context ...o on to Section 4 in which the algorithms are described. As a final note the corresponding author would like to point out that a more detailed discussion of the problems in this paper can be found in =-=[34]-=-. 2 A graph-theoretic characterization of traces In this section we give a graph-theoretic characterization of traces in a form that is helpful for expressing the GMT and RSA problem as integer linear... |

4 |
Generalized sequence alignment and duality
- Pevzner, Waterman
- 1993
(Show Context)
Citation Context ...paration problem can be solved in polynomial time. This leads to another polynomial time algorithm for pairwise alignment that is not based on dynamic programming techniques (see Pevzner and Waterman =-=[33]-=- for a thorough presentation of primal-dual approach for a number of sequence alignment problems that is also not based on dynamic programming). It turns out that our branch-and-cut algorithms can sol... |

3 | Multiple sequence comparison and consistency on multipartite graphs
- Vingron, Pevzner
- 1995
(Show Context)
Citation Context ...neralized to multiple sequence alignment with the notion of a trace of an alignment graph. The relationship between multiple alignment and multipartite graphs was also examined by Vingron and Pevzner =-=[39]-=- in the context of filtering pairwise dot-plots of a set of sequences. 3The Generalized Maximum Trace Problem In the Maximum Trace Problem (MT), introduced originally to model the final multiple alig... |

2 |
Primal: Practical rigorous multiple alignment. Computer software
- Kececioglu
- 1996
(Show Context)
Citation Context ...7]. 5.1 The GMT problem We tested three different ways to generate the extended alignment graph. • As an example of a scoring scheme based on the comparison of two residues (MT) we adapted the PRIMAL =-=[20]-=- package by John Kececioglu. The value of the approximate solution of this program is used as a lower bound in our branch-and-cut algorithm. • As an example for a scoring scheme based on the compariso... |

1 |
Polyhedral Theory. Wiley-interscience series in discrete mathematics and optimization
- Grotschel, Padberg
- 1985
(Show Context)
Citation Context ... k-regular system (A, I) if |F | ≥ k and all ( |F | k k-subsets of F are circuits of (A, I). Let PI be the polyhedron associated with (A, I). Then the following well known theorems hold. 13Theorem 3 =-=[11]-=- Let (A, I) be an independence system and let F = A − ⋃ I. Then the dimension of PI is |A| − |F |. This theorem yields a method to determine whether a polytope is fulldimensional. Theorem 4 [13] If PI... |