Results 1 -
5 of
5
Extensive Simulations for Longest Common Subsequences: Finite Size Scaling, a Cavity Solution, and Configuration Space Properties
, 1998
"... . Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Let LN be the length of a LCS of two random strings of size N . Using extensive Monte Carlo simulations for this ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
. Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Let LN be the length of a LCS of two random strings of size N . Using extensive Monte Carlo simulations for this problem, we find a finite size scaling law of the form E(LN )=N = fl S +AS=(lnN p N) + :::, where fl S and AS are constants depending on S, the alphabet size. We provide precise estimates of fl S for 2 S 15. We also study the related Bernoulli Matching model where the different entries of the "strings" are matched independently with probability 1=S. Let L B NM be the length of a longest sequence of matches in this case, for a given instance of size N \Theta M . On the basis of a cavity-like analysis we find fl B S (r) = (2 p rS \Gamma r \Gamma 1)=(S \Gamma 1), where fl B S (r) is the limit of E(L B NM )=N as N !1, the ratio r = M=N being fixed. This formula agrees very we...
Efficient Algorithms for Sequence Analysis with Concave and Convex Gap Costs
, 1989
"... EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops only. We make the assumption that the gap cost or loop cost is a convex or concave function of the length of the gap or loop, and show how this assumption may be used to develop e#cient algorithms for these problems. We show how the restriction to convex or concave functions may be relaxed, and give algorithms for solving the problems when the cost functions are neither convex nor concave, but can be split into a small number of convex or concave functions. Finally we point out some sparsity in the structure of our sequence analysis problems, and describe how we may take advantage of that sparsity to further speed up our algorithms. CONTENTS 1. Introduction ............................1 ...
A Probabilistic Analysis Of A String Editing Problem And Its Variations
- revised Purdue University
, 1994
"... We consider a string editing problem in a probabilistic framework. This problem is of considerable interest to many facets of science, most notably molecular biology and computer science. A string editing transforms one string into another by performing a series of weighted edit operations of overal ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider a string editing problem in a probabilistic framework. This problem is of considerable interest to many facets of science, most notably molecular biology and computer science. A string editing transforms one string into another by performing a series of weighted edit operations of overall maximum (minimum) cost. The problem is equivalent to finding an optimal path in a weighted grid graph. In this paper, we provide several results regarding a typical behavior of such a path. In particular, we observe that the optimal path (i.e., edit distance) is almost surely (a.s.) equal to ffn for large n where ff is a constant and n is the sum of lengths of both strings. More importantly, we show that the edit distance is well concentrated around its average value. In the so called independent model in which all weights (in the associated grid graph) are statistically independent, we derive some bounds for the constant ff. As a by-product of our results, we also present a precise estim...
Efficient Algorithms for Sequence Analysis
- Proc. Second Workshop on Sequences: Combinatorics, Compression. Securiry
, 1991
"... : We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods for sequence analysis. 1. INTRODUCTION In this paper we consider algorithms for two problems in sequence analysis. The first problem is sequence alignment, and the second is the prediction of RNA structure. Although the two problems seem quite di#erent from each other, their solutions share a common structure, which can be expressed as a system of dynamic programming recurrence equations. These equations also can be applied to other problems, including text formatting and data storage optimization. We use a number of well motivated assumptions about the problems in order to provide e#cient algorithms. The primary assumption is that of concavity or convexity. The recurrence relations for bo...

