Results 1  10
of
103
Linear Approximation of Shortest Superstrings
, 1991
"... We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of distinct strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomialtime algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the sup...
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
How Many Squares Can a String Contain?
, 1998
"... . All our words (strings) are over a fixed alphabet. A square is a subword of the form uu = u 2 , where u is a nonempty word. Two squares are distinct if they are of different shape, not just translates of each other. A word u is primitive if u cannot be written in the form u = v j for some j ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
. All our words (strings) are over a fixed alphabet. A square is a subword of the form uu = u 2 , where u is a nonempty word. Two squares are distinct if they are of different shape, not just translates of each other. A word u is primitive if u cannot be written in the form u = v j for some j 2. A square u 2 with u primitive is primitive rooted. Let M (n) denote the maximum number of distinct squares, P (n) the number of distinct primitive rooted squares in a word of length n. We prove: no position in any word can be the beginning of the rightmost occurrence of more than two squares, from which we deduce M (n) ! 2n for all n ? 0, and P (n) = n \Gamma o(n) for infinitely many n. 1. Introduction We consider words (strings, sequences) over a fixed alphabet throughout. A square is a pair of identical adjacent subwords, such as 1011010110 = 10110 2 over the binary alphabet. Two squares are distinct if they are of different shape, not just translates of each other. Denote by M (n...
RECENT RESULTS ON EXTENSIONS OF STURMIAN WORDS
, 2002
"... Sturmian words are infinite words over a twoletter alphabet that admit a great number of equivalent definitions. Most of them have been given in the past ten years. Among several extensions of Sturmian words to larger alphabets, the Arnoux–Rauzy words appear to share many of the properties of Sturm ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Sturmian words are infinite words over a twoletter alphabet that admit a great number of equivalent definitions. Most of them have been given in the past ten years. Among several extensions of Sturmian words to larger alphabets, the Arnoux–Rauzy words appear to share many of the properties of Sturmian words. In this survey, combinatorial properties of these two families are considered and compared.
On a Special Class of Primitive Words
"... Abstract. When representing DNA molecules as words, it is necessary to take into account the fact that a word u encodes basically the same information as its WatsonCrick complement θ(u), where θ denotes the WatsonCrick complementarity function. Thus, an expression which involves only a word u and ..."
Abstract

Cited by 14 (12 self)
 Add to MetaCart
Abstract. When representing DNA molecules as words, it is necessary to take into account the fact that a word u encodes basically the same information as its WatsonCrick complement θ(u), where θ denotes the WatsonCrick complementarity function. Thus, an expression which involves only a word u and its complement can be still considered as a repeating sequence. In this context, we define and investigate the properties of a special class of primitive words, called θprimitive, which cannot be expressed as such repeating sequences. For instance, we prove the existence of a unique θprimitive root of a given word, and we give some constraints forcing two distinct words to share their θprimitive root. Also, we present an extension of the wellknown Fine and Wilf Theorem, for which we give an optimal bound. 1
Periods and Binary Words
 J. Combin. Theory Ser. A
, 2000
"... We give an elementary short proof for a wellknown theorem of Guibas and Odlyzko stating that the sets of periods of words are independent of the alphabet size. As a consequence of our constructing proof, we give a linear time algorithm which, given a word, computes a binary one with the same period ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
We give an elementary short proof for a wellknown theorem of Guibas and Odlyzko stating that the sets of periods of words are independent of the alphabet size. As a consequence of our constructing proof, we give a linear time algorithm which, given a word, computes a binary one with the same periods. We give also a very short proof for the famous Fine and Wilf's periodicity lemma.
A 2 2/3Approximation Algorithms for the Shortest Superstring Problem
 DIMACS WORKSHOP ON SEQUENCING AND MAPPING
, 1995
"... Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superst ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superstring of minimum length. The shortest superstring problem has applications in both data compression and computational biology. In data compression, the problem is a part of a general model of string compression proposed by Gallant, Maier and Storer (JCSS '80). Much of the recent interest in the problem is due to its application to DNA sequence assembly. The problem has been shown to be NPhard; in fact, it was shown by Blum et al.(JACM '94) to be MAX SNPhard. The rst O(1)approximation was also due to Blum et al., who gave an algorithm that always returns a superstring no more than 3 times the length of an optimal solution. Several researchers have published results that improve on the approximation ratio; of these, the best previous result is our algorithm ShortString, which achieves a 2 3
Combinatorics of Periods in Strings
"... We consider the set (n) of all period sets of strings of length n over a nite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that (n) is a lattice under set inclusion and does not satisfy the JordanDedekind condition. We ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
We consider the set (n) of all period sets of strings of length n over a nite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that (n) is a lattice under set inclusion and does not satisfy the JordanDedekind condition. We propose the rst enumeration algorithm for (n) and improve upon the previously known asymptotic lower bounds on the cardinality of (n). Finally, we provide a new recurrence to compute the number of strings sharing a given period set. 1
Pattern Matching and Membership for Hierarchical Message Sequence Charts
 In Proc. of LATIN 2002, LNCS 2286
, 2002
"... Several formalisms and tools for software development use hierarchy for system design, for instance statecharts and diagrams in UML. Message sequence charts are an ITU standardized notation for asynchronously communicating processes. The standard Z.120 allows (highlevel) MSCreferences that corresp ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Several formalisms and tools for software development use hierarchy for system design, for instance statecharts and diagrams in UML. Message sequence charts are an ITU standardized notation for asynchronously communicating processes. The standard Z.120 allows (highlevel) MSCreferences that correspond to the use of macros. We consider in this paper two basic verification tasks for hierarchical MSCs (nested highlevel MSCs, nHMSC), the membership and the pattern matching problem. We show that the membership problem for nHMSCs is PSPACEcomplete, even using a weaker semantics for nMSCs than the partialorder semantics. For pattern matching nMSCs M;N we exhibit a polynomial algorithm of time O(jM j 2 \Delta jN j 2 ). We use here techniques stemming from algorithms on compressed texts.