## A SAT-Based Approach to Multiple Sequence Alignment (2003)

Venue: | Poster, Ninth International Conference on Principles and Practice of Constraint Programming |

Citations: | 5 - 3 self |

### BibTeX

@INPROCEEDINGS{Prestwich03asat-based,

author = {Steven Prestwich and Des Higgins},

title = {A SAT-Based Approach to Multiple Sequence Alignment},

booktitle = {Poster, Ninth International Conference on Principles and Practice of Constraint Programming},

year = {2003},

pages = {940--944}

}

### OpenURL

### Abstract

Abstract. Multiple sequence alignment is a central problem in Bioinformatics. A known integer programming approach is to apply branch-and-cut to exponentially large graph-theoretic models. This paper describes a new integer program formulation that generates models small enough to be passed to generic solvers. The formulation is a hybrid relating the sparse alignment graph with a compact encoding of the alignment matrix via channelling constraints. Alignments obtained with a SAT-based local search algorithm are competitive with those of state-of-the-art algorithms, though execution times are much longer. 1

### Citations

4089 |
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
- Thompson, Higgins, et al.
- 1994
(Show Context)
Citation Context ...ing up the alignment gradually, following the branching order in the guide tree. This is very fast even for hundreds of sequences, and the most widely used software is the well-known ClustalW package =-=[11]-=-. The TCoffee package [8] also uses a progressive heuristic but has been shown to be more accurate than ClustalW, at the expense of extra computing time. There are also several methods based on optimi... |

1636 |
A general method applicable to the search for similarities in the amino acid sequence of two proteins
- Needleman, Wunsch
- 1970
(Show Context)
Citation Context ...nction which use Genetic Algorithms [7] or iteration [2]. These vary in the extent to which they are practical for more than a few sequences or in the quality of the optimisation. Dynamic programming =-=[6]-=- has been used for MSA problems but is known to scale poorly to more than a few sequences. More successful is the Complete Maximum Weight Trace (CMWT) formulation in which the symbols are viewed as ve... |

495 |
T-coffee: a novel method for fast and accurate multiple sequence alignment
- Notredame
- 2000
(Show Context)
Citation Context ...ally, following the branching order in the guide tree. This is very fast even for hundreds of sequences, and the most widely used software is the well-known ClustalW package [11]. The TCoffee package =-=[8]-=- also uses a progressive heuristic but has been shown to be more accurate than ClustalW, at the expense of extra computing time. There are also several methods based on optimising the WSP (weighted su... |

108 |
Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments
- Gotoh
- 1996
(Show Context)
Citation Context ... ClustalW, at the expense of extra computing time. There are also several methods based on optimising the WSP (weighted sums of pairs) objective function which use Genetic Algorithms [7] or iteration =-=[2]-=-. These vary in the extent to which they are practical for more than a few sequences or in the quality of the optimisation. Dynamic programming [6] has been used for MSA problems but is known to scale... |

108 |
HOMSTRAD: a database of protein structure alignments for homologous families
- Mizuguchi, Deane, et al.
- 1998
(Show Context)
Citation Context ...nts). On solving the final CSP an alignment matrix is constructed, then post-processed by applying simple transformations to reduce the number of columns used. We take MSA instances from the HOMSTRAD =-=[5]-=- database of protein alignments. We generate sparse alignment graphs using T-Coffee with default settings. It takes every pair of sequences and outputs weighted pairs of symbols, aligning each pair of... |

102 | SAGA: sequence alignment by genetic algorithm
- Notredame, Higgins
- 1996
(Show Context)
Citation Context ...ore accurate than ClustalW, at the expense of extra computing time. There are also several methods based on optimising the WSP (weighted sums of pairs) objective function which use Genetic Algorithms =-=[7]-=- or iteration [2]. These vary in the extent to which they are practical for more than a few sequences or in the quality of the optimisation. Dynamic programming [6] has been used for MSA problems but ... |

30 | A branch-and-cut Algorithm for multiple sequence alignment
- Reinert, Lenhof, et al.
- 1997
(Show Context)
Citation Context ...dge via the choice of subgraph. The usual way of ensuring that the realised edges form a valid trace is to enumerate all critical mixed cycles in the graph, adding a constraint to prohibit each cycle =-=[1, 4, 10]-=- (other constraints may also be added). The MWT and related formulations have natural integer linear program (ILP) models. The number of constraints is exponential in the size |E| of the graph [1] but... |

29 |
Exact and approximation algorithms for DNA sequence reconstruction
- Kececioglu
- 1991
(Show Context)
Citation Context ...with those of state-of-the-art algorithms, though execution times are much longer. 1 Background Multiple sequence alignment (MSA) is a central problem in Bioinformatics and is known to be NP-complete =-=[3]-=-. Given a number of sequences of symbols from an alphabet, the aim is to align them while maximizing some function. Gaps may be introduced between symbols, and in some MSA formulations the objective f... |

23 | Randomised backtracking for linear pseudo-boolean constraint problems
- Prestwich
(Show Context)
Citation Context ...v. The interest of this form is that it is only a slight extension of SAT, and many SAT algorithms generalise easily to it. We apply the Saturn hybrid local search algorithm, which was generalised in =-=[9]-=- and gave good results on block design and sports scheduling problems. Saturn uses each solution as a starting point for the next CSP, by reassigning as many variables as possible (under a random vari... |

21 | A polyhedral approach to sequence alignment problems
- Kececioglu, Lenhof, et al.
- 2000
(Show Context)
Citation Context ...dge via the choice of subgraph. The usual way of ensuring that the realised edges form a valid trace is to enumerate all critical mixed cycles in the graph, adding a constraint to prohibit each cycle =-=[1, 4, 10]-=- (other constraints may also be added). The MWT and related formulations have natural integer linear program (ILP) models. The number of constraints is exponential in the size |E| of the graph [1] but... |

15 | Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics
- Althaus
- 2002
(Show Context)
Citation Context ...dge via the choice of subgraph. The usual way of ensuring that the realised edges form a valid trace is to enumerate all critical mixed cycles in the graph, adding a constraint to prohibit each cycle =-=[1, 4, 10]-=- (other constraints may also be added). The MWT and related formulations have natural integer linear program (ILP) models. The number of constraints is exponential in the size |E| of the graph [1] but... |