## Optimization Problems in Molecular Biology: A Survey and Critical Review

### BibTeX

@MISC{Festa_optimizationproblems,

author = {Paola Festa},

title = {Optimization Problems in Molecular Biology: A Survey and Critical Review},

year = {}

}

### OpenURL

### Abstract

Computational molecular biology has emerged as one of the most exciting interdisciplinary fields, riding on the success of the ongoing Human Genome Project, which culminated in the 2001 announcement of the complete sequencing of the human genome. It is only in the past few years that it has been shown that a large number of molecular biology problems can be formulated as combinatorial optimization problems, including sequence alignment problems, genome rearrangement problems, string selection and comparison problems, and protein structure prediction and recognition. This paper provides a detailed description of several interesting molecular biology problems that can be formulated as combinatorial optimization problems and surveys the most efficient state-of-the-art techniques and algorithms to exactly or approximately solve them.

### Citations

1680 |
Identification of Common Molecular Subsequences
- Smith, Waterman
- 1981
(Show Context)
Citation Context ...tal cost denoted by d∗ A (s, t) and also called edit distance of s and t. An exact solution of the problem can be found by applying a Dynamic Programming method proposed in 1981 by Smith and Waterman =-=[28]-=-. Their algorithm has polynomial computational complexity O(n2 ), where n denotes the length of the input sequences3 . 2 This is not a restrictive assumption, since it is always possible to make two s... |

952 |
Approximation Algorithms
- Vazirani
- 2001
(Show Context)
Citation Context ...n be done to give a guarantee of quality for the returned solution. A vast literature on approximation algorithms has been developed in the last decade, and good starting points are the books [5] and =-=[30]-=-. Nevertheless, sometimes it is preferable to approach and solve the problem heuristically, especially in the presence of large scale problem instances. A heuristic approach finds a suboptimal solutio... |

844 | The Protein Data Bank
- Berman
- 2000
(Show Context)
Citation Context ...puter science and physics. The main contribution of computer scientists is in the design and implementation of correct and efficient methods to elaborate a huge amount of collected experimental data (=-=[1, 29]-=-), while with the help of physicists it is possible to better study molecules and their three-dimensional structure. Only in the past few years optimization models have been analyzed and proposed by t... |

526 | Greedy randomized adaptive search procedures
- Feo, Resende
- 1995
(Show Context)
Citation Context ...art heuristic algorithm, successfully applied to find good quality solutions to several computationally intractable combinatorial problems and originally proposed in the literature by Feo and Resende =-=[6, 7]-=-. For a comprehensive study of GRASP strategies and variants, the reader is referred to the survey chapter by Resende and Ribeiro [24], as well as to the annotated bibliography of Festa and Resende [9... |

251 | On the complexity of multiple sequence alignment
- Wang, Jiang
- 1994
(Show Context)
Citation Context ... 1≤i<j≤k l=1 γ(A[i][l],A[j][l]), where |A| is the alignment length, i.e. its number of columns. The computational intractability of the multiple alignment problem was proved in 1994 by Wang and Jiang =-=[31]-=-. In fact, a generalization of Smith and Waterman’s dynamic programming algorithm has exponential computational complexity given by O(2 k l k ) and therefore it cannot be used to solve even small size... |

171 |
The multiple sequence alignment problem in biology
- Carillo, Lipman
- 1988
(Show Context)
Citation Context ...lexity given by O(2 k l k ) and therefore it cannot be used to solve even small size instances of the problem. Particularly interesting are some observation reported in 1988 by Carrillo and Lipman in =-=[4]-=-, leading to further constraints of the problem that the authors proved useful in reducing computation in the dynamic programming method. In more detail, they showed how to establish a correspondence ... |

155 |
A probabilistic heuristic for a computationally difficult set covering problem
- Feo, Resende
- 1989
(Show Context)
Citation Context ...art heuristic algorithm, successfully applied to find good quality solutions to several computationally intractable combinatorial problems and originally proposed in the literature by Feo and Resende =-=[6, 7]-=-. For a comprehensive study of GRASP strategies and variants, the reader is referred to the survey chapter by Resende and Ribeiro [24], as well as to the annotated bibliography of Festa and Resende [9... |

104 |
Efficient methods for multiple sequence alignments with guaranteed error bounds
- Gusfield
- 1993
(Show Context)
Citation Context ...it is meaningful to apply it for solving optimization problems that are hard even to approximate. One among the most effective approximation methods for multiple alignment problems is due to Gusfield =-=[12]-=- who in 1993 proposed an iterative technique that progressively builds an alignment by considering one sequence at time. It starts by building a star7 having a node for each of the k sequences. One of... |

99 | SAGA: Sequence alignment by genetic algorithm
- Notredame, Higgins
- 1996
(Show Context)
Citation Context ...trategies are used to structure information in order to find efficiently near-optimal solutions. For general alignment problems, besides a genetic algorithm8 proposed by Notredame and Higgins in 1996 =-=[21]-=-, very little effort has been put forth along this line of research. 3 Genome rearrangement problems Genome rearrangement problems consist in finding similarities and diversities among genomes of diff... |

82 | GRASP: an annotated bibliography
- Festa, Resende
- 2001
(Show Context)
Citation Context ...7]. For a comprehensive study of GRASP strategies and variants, the reader is referred to the survey chapter by Resende and Ribeiro [24], as well as to the annotated bibliography of Festa and Resende =-=[9]-=- for a survey of applications. Generally speaking, GRASP is a randomized heuristic method that for a certain number of iterations realizes two phases: a construction phase and aOptimization in molecu... |

81 | The EMBL Nucleotide Sequence Database
- Stoesser, Baker, et al.
- 2002
(Show Context)
Citation Context ...puter science and physics. The main contribution of computer scientists is in the design and implementation of correct and efficient methods to elaborate a huge amount of collected experimental data (=-=[1, 29]-=-), while with the help of physicists it is possible to better study molecules and their three-dimensional structure. Only in the past few years optimization models have been analyzed and proposed by t... |

65 | Distinguishing string selection problems
- Lanctot, Li, et al.
- 1999
(Show Context)
Citation Context ...d Park [27]. For most consensus problems, Hamming distance is used instead of editing distance and biological reasons justifying this choice are very well described and motivated by Lanctot et al. in =-=[17]-=-. Apart from the distance definition, this class of problems are a further important example of molecular biology problems that are in essence combinatorial optimization problems. Subsequent subsectio... |

61 |
On covering problems of codes
- Frances, Litman
- 1997
(Show Context)
Citation Context ... of brevity, this family of problems is not surveyed in this paper.280 P. Festa Computational intractability of the general sequences consensus problem was first proved in 1997 by Frances and Litman =-=[10]-=- and in 1999 by Sim and Park [27]. For most consensus problems, Hamming distance is used instead of editing distance and biological reasons justifying this choice are very well described and motivated... |

57 | Finding similar regions in many strings
- Li, Ma, et al.
- 1999
(Show Context)
Citation Context ...0, 1} to be x = 1 with a certain probability y, where y is the value of the continuous variable corresponding to x in the relaxation of the original integer programming problem. In 1999, Li et al. in =-=[19]-=- used the rounding idea to design a polynomial time approximation scheme (PTAS). A PTAS is a special type of approximation algorithm that, 10 The linear relaxation Pr − c of an integer programming pro... |

55 |
Metaheuristics: A bibliography
- Osman, Laporte
- 1996
(Show Context)
Citation Context ...Search Procedure), Simulated Annealing, and Iterated Local Search. It has to be underlined that until now there is not yet a commonly accepted definition for the term metaheuristic. Osman and Laporte =-=[22]-=- in their metaheuristics bibliography define it as: A metaheuristic is formally defined as an iterative generation process which guides a subordinate heuristic by combining intelligently different con... |

51 | Divide-and-conquer frontier search applied to optimal sequence alignment
- Korf, Zhang
- 2000
(Show Context)
Citation Context ... and not only stars, preserving an approximation guarantee of 2. To solve real instances of the multiple sequence alignment problem, any method requires a huge amount of computer memory. Recently, in =-=[16]-=- Korf and Zhang presented a new algorithm that reduces the space complexity of heuristic search compared to state-of-the-art algorithms. This target is achieved by improving a classical technique to f... |

46 |
The Maximum Weight Trace Problem in Multiple Sequence Alignment, CPM ’93
- Kececioglu
- 1993
(Show Context)
Citation Context ...n) ≤ cg(n) ∀ n ≥ n0}.274 P. Festa In 2000, Kececioglu et al. [15] proposed an alternative and more effective method based on the maximum weight trace problem, which has been introduced by Kececioglu =-=[14]-=- as a different optimization problem that generalizes the SP-score objective function. A trace has a graph theoretic definition, but in the following it will be specialized for alignments problems. Le... |

35 |
Parametric optimization of sequence alignment
- Gusfield, Balasubramanian, et al.
- 1994
(Show Context)
Citation Context ...equence s corresponds to the k-th element of the second sequence t, no element following i in s can correspond to an element preceding k in t. An interesting paper appeared in 1994 by Gusflied et al. =-=[13]-=- underlines the difficult task of providing a unique definition of similarity and distance among sequences. The authors observe that in DNA and amino acid sequences there is considerable disagreement ... |

30 | A branch-and-cut Algorithm for multiple sequence alignment
- Reinert, Lenhof, et al.
- 1997
(Show Context)
Citation Context ...al weight. Let E = {(vih,vir) | i =1,...,k, 1 ≤ h<r≤ oi} be the set of directed arcs connecting each element vir (i =1,...,k) to the elements that follow it in the sequence si, then Kececioglu et al. =-=[23]-=- have shown that a trace T must satisfy the following proposition: Proposition 2.1 T is a trace if and only if there is no directed mixed cycle 5 C in the graph G =(V,T ∪ E). The property of a trace T... |

21 | A polyhedral approach to sequence alignment problems
- Kececioglu, Lenhof, et al.
- 2000
(Show Context)
Citation Context ... original. tions that it performs is such that f(n) ∈ O(g(n)), where O(g(n)) = {h(n) | there exist positive constants c and n0 s.t. 0 ≤ f(n) ≤ cg(n) ∀ n ≥ n0}.274 P. Festa In 2000, Kececioglu et al. =-=[15]-=- proposed an alternative and more effective method based on the maximum weight trace problem, which has been introduced by Kececioglu [14] as a different optimization problem that generalizes the SP-s... |

20 | Opportunities for combinatorial optimization in computational biology
- Greenberg
- 2004
(Show Context)
Citation Context ... rearrangement problems, string selection and comparison problems, and protein structure prediction and recognition. An exhaustive survey can be found in a recent paper of Greenberg, Hart, and Lancia =-=[11]-=-. The scope of this paper is to provide a detailed description of several interesting biological system problems that can be formulated as combinatorial optimization problems. Special emphasis will be... |

14 |
Genomic divergence through gene rearrangement
- Sankoff, Cedergren, et al.
- 1990
(Show Context)
Citation Context ...simulates the evolutionary process of biological species.278 P. Festa genomes represented as two sets of ordered lists of genes, the general genome comparison problem as defined by Sankoff et al. in =-=[26]-=- consists in finding the minimum number of evolutionary events required to turn one of the genomes into the other. Unfortunately, it is still an open problem the development of mathematical models and... |

12 | The Consensus String Problem for a Metric is NP-Complete
- Sim, Park
- 2001
(Show Context)
Citation Context ...ems is not surveyed in this paper.280 P. Festa Computational intractability of the general sequences consensus problem was first proved in 1997 by Frances and Litman [10] and in 1999 by Sim and Park =-=[27]-=-. For most consensus problems, Hamming distance is used instead of editing distance and biological reasons justifying this choice are very well described and motivated by Lanctot et al. in [17]. Apart... |

5 |
Optimization techniques for string selection and comparison problems in genomics
- Meneses, Oliveira, et al.
(Show Context)
Citation Context ... In such cases, heuristic methods can be useful in determining good solutions at least for a class of instances. A simple heuristic for the FFMSP has been recently proposed by Pardalos et al. in 2005 =-=[20]-=-. It consists of the following two phases. • A construction phase, that iteratively builds a feasible solution s. Initially, – for each position j, compute the set Vj of characters appearing in that p... |

4 |
Sorting permutations by reversals through branch and price
- Caprara, Lancia, et al.
(Show Context)
Citation Context ...y reversal problem consists in finding d(π) and a sequence of reversals such that π = ιr1 ··· rd(π). Successful mathematical programming approaches for this problem are described by Caprara et al. in =-=[2, 3]-=-. 4 String selection and comparison problems String selection and comparison problems belong to the more general molecular biology class of problems known as sequences consensus, where a finite set of... |

3 |
On some optimization problems in mulecolar biology
- Festa
(Show Context)
Citation Context ...ther character in Vj (at random). The authors show that it empirically improves the solution by making it a local optimum relative to this type of transformation. 4.3.1 A GRASP heuristic Recently, in =-=[8]-=- a GRASP (Greedy Randomized Adaptive Search Procedure) have been proposed to find high quality suboptimal solutions for the FFMSP. GRASP is an iterative multi-start heuristic algorithm, successfully a... |

2 |
Coding and Information Theory. volume 134
- Roman
- 1992
(Show Context)
Citation Context ...ng formulation: min s.t. d ∑ xjk =1, j∈Vk m − m∑ xsi jj ≤ d, j=1 d ∈ N + ,xjk ∈{0, 1}, k =1, 2,...,m i =1, 2,...,n k =1, 2,...,m, ∀ j ∈ Vk. This problem was first studied in the area of coding theory =-=[25]-=- and recently has been independently proved computationally intractable by Frances and Litman [10] and Lanctot et al. [17, 18]. The first approximation algorithm for the CSP was proposed by Lanctot et... |

1 |
A column-generation based branchand-price algorithm for sorting by reversals
- Caprara, Lancia, et al.
- 1999
(Show Context)
Citation Context ...y reversal problem consists in finding d(π) and a sequence of reversals such that π = ιr1 ··· rd(π). Successful mathematical programming approaches for this problem are described by Caprara et al. in =-=[2, 3]-=-. 4 String selection and comparison problems String selection and comparison problems belong to the more general molecular biology class of problems known as sequences consensus, where a finite set of... |

1 |
A polynomial-time approxiamtion scheme for minimum routing cost spanning trees
- Wu, Lancia, et al.
- 1999
(Show Context)
Citation Context ...unction value corresponding to the suboptimal solution found by Gusfield’s algorithm, respectively. Then, in [12] Gusfield proved that ˆ SPI ≤ (2 − 2 , for any k )SP∗ I instance I. In 1999, Wu et al. =-=[32]-=- generalized Gusfield’s method to use any tree and not only stars, preserving an approximation guarantee of 2. To solve real instances of the multiple sequence alignment problem, any method requires a... |