Results 1  10
of
15
A polyhedral approach to sequence alignment problems
 DISCRETE APPL. MATH
, 2000
"... We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framewo ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facetdefining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branchandcut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.
A local alignment tool for very long DNA sequences
 Comput. Appl. Biosci
, 1995
"... Abstract. This paper presents a practical program, called sim2, for building local alignments of two sequences, each of which may be hundreds of kilobases long. Sim2 first constructs n best nonintersecting chains of ‘‘fragments,’ ’ such as all occurrences of identical 5tuples in each of two DNA se ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. This paper presents a practical program, called sim2, for building local alignments of two sequences, each of which may be hundreds of kilobases long. Sim2 first constructs n best nonintersecting chains of ‘‘fragments,’ ’ such as all occurrences of identical 5tuples in each of two DNA sequences, for any specified n ≥ 1. Each chain is then refined by delivering an optimal alignment in a region delimited by the chain. Sim2 requires only space proportional to the size of the input sequences and the output alignments, and the same source code runs on UNIX machines, on Macintosh, on PC, and on DEC ALPHA PC. We also describe an application of sim2 for aligning long DNA sequences from E. coli. Sim2 facilitates contigbuilding by providing a complete view of the related sequences, so differences can be analyzed and inconsistencies resolved. Examples are shown using the alignment display and editing functions from the software tool, ChromoScope.
Algorithms for Transposition Invariant String Matching (Extended Abstract)
 Journal of Algorithms
, 2002
"... Given strings A and B over an alphabet Σ ⊆ U, where U is some numerical universe closed... ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Given strings A and B over an alphabet Σ ⊆ U, where U is some numerical universe closed...
Efficient Algorithms for Sequence Analysis with Concave and Convex Gap Costs
, 1989
"... EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops only. We make the assumption that the gap cost or loop cost is a convex or concave function of the length of the gap or loop, and show how this assumption may be used to develop e#cient algorithms for these problems. We show how the restriction to convex or concave functions may be relaxed, and give algorithms for solving the problems when the cost functions are neither convex nor concave, but can be split into a small number of convex or concave functions. Finally we point out some sparsity in the structure of our sequence analysis problems, and describe how we may take advantage of that sparsity to further speed up our algorithms. CONTENTS 1. Introduction ............................1 ...
Large Grain Size Stochastic Optimization Alignment
 in Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering (BIBE’06). IEEE Computer Society
, 2006
"... DNA sequence alignment is a critical step in identifying homology between organisms. The most widely used alignment program, ClustalW, is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
DNA sequence alignment is a critical step in identifying homology between organisms. The most widely used alignment program, ClustalW, is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown to be effective in combining alignment and phylogenetic search in order to avoid the problems associated with poor guide trees. The optimization alignment algorithm operates at a small grain size, aligning each tree found, wasting time producing multiple sequence alignments for suboptimal trees. This research develops and analyzes a large grain size algorithm for optimization alignment that iterates through steps of alignment and phylogeny search, thus improving the quality of guide trees used for computation of multiple sequence alignments and eliminating computation of multiple sequence alignments for suboptimal guide trees. Local minima are avoided by the use of stochastic search methods. Large Grain Size Stochastic Optimization Alignment (LGA) exploits the relationship between phylogenies and multiple sequence alignments, and in so doing achieves improved alignment accuracy. LGA is licensed under the GNU General Public License. Source code and data sets are publicly available at
Multiple Sequence Comparison and Consistency on Multipartite Graphs
 Adv. Appl. Math
, 1995
"... Calculation of dotmatrices is a widespread tool in biological sequence comparison. As a visual aid they are used in pairwise sequence comparison but so far have been of little help in the simultaneous comparison of several sequences. Viewing dotmatrices as projections of unknown ndimensional poin ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Calculation of dotmatrices is a widespread tool in biological sequence comparison. As a visual aid they are used in pairwise sequence comparison but so far have been of little help in the simultaneous comparison of several sequences. Viewing dotmatrices as projections of unknown ndimensional points we consider the multiple alignment problem (for n sequences) as an ndimensional image reconstruction problem with noise. We model this situation using a multipartite graph and introduce a notion of "consistency" on such a graph. From this perspective we introduce and develop the filtering method due to Vingron and Argos (J. Mol. Biol. (1991), 218, pp. 3343). We discuss a conjecture of theirs regarding the number of iterations their algorithm requires and demonstrate that this number may be large. An improved version of the original algorithm is introduced that avoids costly dotmatrix multiplications and runs in O(n 3 \Delta L 3 ) time (L is the length of the longest sequence and n i...
J.G.: Alignment of Noisy Unstructured Text Data
 In: Proc. of the IJCAI Workshop on Analytics for Noisy Unstructured Text Data (AND 2007) of the 20th International Joint Conference on Artificial Intelligence (IJCAI
, 2007
"... This paper describes a textual aligner named MEDITE whose specificity is the detection of moves. It was developed to solve a problem from textual genetic criticism, a humanities discipline that compares different versions of authors ’ texts in order to highlight invariants and differences between th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper describes a textual aligner named MEDITE whose specificity is the detection of moves. It was developed to solve a problem from textual genetic criticism, a humanities discipline that compares different versions of authors ’ texts in order to highlight invariants and differences between them. Our aligner handles this task and it is general enough to handle others. The algorithm, based on the edit distance with moves, aligns duplicated character blocks with an A ∗ heuristic algorithm. We present an experimental evaluation of our algorithm by comparing it with similar ones in four experiments. The first one deals with the alignment of texts with a large amount of repetitions; we show it is a very difficult problem. Two other experiments are duplicate linkage and text reuse detection. Finally, the algorithm is tested with synthetic data. 1
Efficient Algorithms for Sequence Analysis
 Proc. Second Workshop on Sequences: Combinatorics, Compression. Securiry
, 1991
"... : We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
: We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods for sequence analysis. 1. INTRODUCTION In this paper we consider algorithms for two problems in sequence analysis. The first problem is sequence alignment, and the second is the prediction of RNA structure. Although the two problems seem quite di#erent from each other, their solutions share a common structure, which can be expressed as a system of dynamic programming recurrence equations. These equations also can be applied to other problems, including text formatting and data storage optimization. We use a number of well motivated assumptions about the problems in order to provide e#cient algorithms. The primary assumption is that of concavity or convexity. The recurrence relations for bo...
Faster Algorithms for Optimal Multiple Sequence Alignment based on Pairwise Comparisons ∗
"... Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, several heuristic ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, several heuristics were suggested for the problem. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution. We solve the following version of the MSA problem: In a preprocessing stage pairwise alignments are found for every pair of sequences. The goal is to find an optimal alignment in which matches are restricted to positions that were matched at the preprocessing stage. We prove that it suffices to find an optimal alignment of sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify “shortcuts” that expedite the dynamic programming scheme. Under some more assumptions, namely, that matches between segments are transitive, we show how to further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm. 1.