Results 1  10
of
85
Suffix arrays: A new method for online string searches
, 1991
"... A new and conceptually simple data structure, called a suffix array, for online string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that ..."
Abstract

Cited by 765 (0 self)
 Add to MetaCart
A new and conceptually simple data structure, called a suffix array, for online string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit online string searches of the type, "Is W a substring of A?" to be answered in time O(P + log N), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in O(N) time in the worst case, versus O(N log N) time for suffix arrays. However, we give an augmented algorithm that, regardless of the alphabet size, constructs suffix arrays in O(N) expected time, albeit with lesser space efficiency. We believe that suffix arrays will prove to be better in practice than suffix trees for many applications.
Transducers and repetitions
 Theoretical Computer Science
, 1986
"... Abstract. The factor transducer of a word associates to each of its factors (or subwc~rds) their first occurrence. Optimal bounds on the size of minimal factor transducers together with an algorithm for building them are given. Analogue results and a simple algorithm are given for the case of subseq ..."
Abstract

Cited by 94 (18 self)
 Add to MetaCart
(Show Context)
Abstract. The factor transducer of a word associates to each of its factors (or subwc~rds) their first occurrence. Optimal bounds on the size of minimal factor transducers together with an algorithm for building them are given. Analogue results and a simple algorithm are given for the case of subsequential suffix transducers. Algorithms are applied to repetition searching in words. Rl~sum~. Le transducteur des facteurs d'un mot associe a chacun de ses facteurs leur premiere occurrence. On donne des bornes optimales sur la taille du transducteur minimal d'un mot ainsi qu'un algorithme pour sa construction. On donne des r6sultats analogues et un algorithme simple dans le cas du transducteur souss~luentiel des suffixes d'un mot. On donne une application la d6tection de r6p6titions dans les mots. Contents
A theory of parameterized pattern matching: Algorithms and applications
 In ACM Symposium on Theory of Computing
, 1993
"... ..."
Finding Maximal Repetitions in a Word in Linear Time
 In Symposium on Foundations of Computer Science
, 1999
"... A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combi ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
(Show Context)
A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combinatorial result asserting that the sum of exponents of all maximal repetitions of a word of length is bounded by a linear function in. This implies, in particular, that there is only a linear number of maximal repetitions in a word. This allows us to construct a lineartime algorithm for finding all maximal repetitions. Some consequences and applications of these results are discussed, as well as related works. 1.
Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music
, 2003
"... In this paper we give an overview of four algorithms that we have developed for pattern matching, pattern discovery and data compression in multidimensional datasets. We show that these algorithms can fruitfully be used for processing musical data. In particular, we show that our algorithms can disc ..."
Abstract

Cited by 60 (18 self)
 Add to MetaCart
In this paper we give an overview of four algorithms that we have developed for pattern matching, pattern discovery and data compression in multidimensional datasets. We show that these algorithms can fruitfully be used for processing musical data. In particular, we show that our algorithms can discover instances of perceptually signifrant musica 1 repetition that cannot be found using previous approaches. We also describe results that suggest the possibility of using our datacompression algorithm for modelling expert motivicthematic music analysis.
Linear Time Algorithms for Finding and Representing all the Tandem Repeats in a String
 TREES, AND SEQUENCES: COMPUTER SCIENCE AND COMPUTATIONAL BIOLOGY
, 1998
"... A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
(Show Context)
A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats.
Optimal Parallel Algorithms for Periods, Palindromes and Squares (Extended Abstract)
, 1992
"... ) Alberto Apostolico Purdue University and Universit`a di Padova Dany Breslauer yyz Columbia University Zvi Galil z Columbia University and TelAviv University Summary of results Optimal concurrentread concurrentwrite parallel algorithms for two problems are presented: ffl Finding all the pe ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
) Alberto Apostolico Purdue University and Universit`a di Padova Dany Breslauer yyz Columbia University Zvi Galil z Columbia University and TelAviv University Summary of results Optimal concurrentread concurrentwrite parallel algorithms for two problems are presented: ffl Finding all the periods of a string. The period of a string can be computed by previous efficient parallel algorithms only if it is shorter than half of the length of the string. Our new algorithm computes all the periods in optimal O(log log n) time, even if they are longer. The algorithm can be used to compute all initial palindromes of a string within the same bounds. ffl Testing if a string is squarefree. We present an optimal O(log log n) time algorithm for testing if a string is squarefree, improving the previous bound of O(log n) given by Apostolico [1] and Crochemore and Rytter [12]. We show matching lower bounds for the optimal parallel algorithms that solve the problems above on a general alphab...
Finding maximal pairs with bounded gap
 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1645 of Lecture Notes in Computer Science
, 1999
"... A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this pape ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n + z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Maximal repetitions in strings
, 2008
"... The cornerstone of any algorithm computing all repetitions in strings of length n in O(n) time is the fact that the number of maximal repetitions (runs) is linear. Therefore, the most important part of the analysis of the running time of such algorithms is counting the number of runs. Kolpakov and K ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
The cornerstone of any algorithm computing all repetitions in strings of length n in O(n) time is the fact that the number of maximal repetitions (runs) is linear. Therefore, the most important part of the analysis of the running time of such algorithms is counting the number of runs. Kolpakov and Kucherov [FOCS’99] proved it to be cn but could not provide any value for c. Recently, Rytter [STACS’06] proved that c ≤ 5. His analysis has been improved by Puglisi et al. to obtain 3.48 and by Rytter to 3.44 (both submitted). The conjecture of Kolpakov and Kucherov, supported by computations, is that c = 1. Here we improve dramatically the previous results by proving that c ≤ 1.6 and show how it could be improved by computer verification down to 1.18 or less. While the conjecture may be very difficult to prove, we believe that our work provides a good approximation for all practical purposes. For the stronger result concerning the linearity of the sum of exponents, we give the first explicit bound: 5.6n. Kolpakov and Kucherov did not have any and Rytter considered “unsatisfactory” the bound that could be deduced from his proof. Our bound could be as well improved by computer verification down to 2.9n or less.