## An Algorithm for Approximate Tandem Repeats (1993)

Venue: | In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science |

Citations: | 72 - 2 self |

### BibTeX

@INPROCEEDINGS{Landau93analgorithm,

author = {Gad M. Landau and Jeanette P. Schmidt and Dina Sokol and Incyte Pharmaceuticals},

title = {An Algorithm for Approximate Tandem Repeats},

booktitle = {In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science},

year = {1993},

pages = {120--133},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.

### Citations

2541 |
The Design and Analysis of Computer Algorithms
- Aho, Hopcroft, et al.
- 1974
(Show Context)
Citation Context ...t the special circumstances we are dealing with could not be somehow exploited. In the current paper we are using mergeable 2-3 heaps to compute P [j; h] in O(log k) time, as for example described in =-=[AHU74]-=-, Sections 4.11-4.12. Each list corresponds to the list of leaves of a 2-3 tree. The leaves of each subtree are hence reports of consecutive diagonals. The information stored in each internal node is ... |

1564 | A general method applicable to the search for similarities in the amino acid sequence of two proteins - Needleman, Wunsch - 1970 |

1330 |
Binary codes capable of correcting deletions, insertions, and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...ches between two strings of equal length. The string s=bearbeer is an approximate single repeat with a Hamming distance of 1. A more commonly used measure is the edit distance, defined by Levenshtein =-=[Lev66]-=- as the minimum number of insertions, deletions, and substitutions necessary to transform one string into the other. An example of a repeat with edit distance of 2 is, s = acttajgctt. The most rigorou... |

942 | Algorithms on Strings, Trees, And Sequences - Gusfield - 1997 |

658 | Fast pattern matching in strings - Knuth, Morris, et al. - 1977 |

444 | Tandem repeats finder: A program to analyze DNA sequences - Benson - 1999 |

223 | Introduction to Computational Molecular Biology - Setubal, Meidanis - 1997 |

198 |
Algorithms for approximate string matching
- Ukkonen
- 1985
(Show Context)
Citation Context ...es along the diagonals in PR increase and successive values differ by at most one, it is 7 hence sufficient to record, for each hsk, the last element on each diagonal whose value is h, as observed in =-=[U83]-=-. In order to simplify the explanation we give the diagonals fixed numbers, which are independent of the reference point j. Define the diagonal through [d; n=2] as diagonal d. (Note that all diagonals... |

114 |
Fast parallel and serial approximate string matching
- Landau, Vishkin
- 1989
(Show Context)
Citation Context ...ttern and n is the length of the text. Since we call [LV86] with the pattern and text of length O(n), the complexity of each iteration becomes O(nk log n). Although the theoretical time complexity of =-=[LV89]-=- is better, the suffix trees are difficult to program, and the practical runtime may not be significantly faster. As mentioned, we must compute ck positions of mismatches to comply with our definition... |

71 |
An O(n log n) algorithm for finding all repetitions in a string
- Main, Lorentz
- 1984
(Show Context)
Citation Context ...tterns within a sequence. A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings,se.g. abcabc. It is a well-studied problem. Main and Lorentz =-=[ML84]-=- present an O(n log n) algorithm, Department of Computer Science, Haifa University, Haifa 31905, Israel, phone: (972-4) 824-0103, FAX: (972-4) 824-9331; Department of Computer and Information Science,... |

59 |
Highest Scoring Paths In Weighted Grid Graphs and Their Application To Finding All Approximate Repeats
- Schmidt, All
- 1998
(Show Context)
Citation Context ...ve algorithm for finding all highest scoring non-overlapping alignments in O(n 2 log 2 n) time and O(n 2 log n) space. Benson [B95] simplifies this algorithm and reduces the space to O(n 2 ). Schmidt =-=[S98]-=- uses weighted grid graphs to find all locally optimal approximate repeats, improving the time to O(n 2 log n), with O(n 2 ) space. Practical use of these algorithms for searching large databases is p... |

54 |
Fast String Matching with k Differences
- Landau, Vishkin
(Show Context)
Citation Context ...s the last row, (upward from j), with value h on that diagonal. P [j; h] (resp. S [j; h]) can be computed from P [j; h \Gamma 1] (resp. S [j; h \Gamma 1]) in O(1) time using suffix trees, as shown in =-=[LV88]-=-, in place of the above while loop. Therefore, for a given j, Part 1 of step 1 (resp. 2) is computed in O(k) time. It remains to combine the pieces. Part 2. We can now identify all repeats r =suu for ... |

42 |
Efficient string matching with k mismatches
- Landau, Vishkin
- 1986
(Show Context)
Citation Context ...the following input and output. INPUT: 1) a biological sequence S, 2) an integer k. OUTPUT: all canonical k-repeats that occur in S. In the implementation, P and S are computed using the algorithm of =-=[LV86]-=-, with complexity O(k(m log m+ n)) where m is the length of the pattern and n is the length of the text. Since we call [LV86] with the pattern and text of length O(n), the complexity of each iteration... |

40 | Incremental string comparison
- Landau, Myers, et al.
- 1998
(Show Context)
Citation Context ...maximum number of columns to the right of n=2 reached with at most h differences. Since there might be more than one such point we define (j; h) as the set of corresponding end points on the wave. In =-=[LMS98]-=- we discuss how to compute the k waves (wave 0, wave 1, : : : wave k). Section 2.3.1 describes an algorithm to find P [j; h] and (j; h) for each wave. Until the end of this subsection assume that P [j... |

33 | The complete 685-kilobase DNA sequence of the human b T cell receptor locus. Science 272:1755–1762 - Rowen, Koop, et al. - 1996 |

29 | A method for fast database search for all k-nucleotide repeats
- Benson, Waterman
- 1994
(Show Context)
Citation Context ...leading to much more efficient algorithms. Algorithms for finding all approximate single repeats, using the weighted edit distance as the similarity measure, can be found in [KM96, B95, S98, M92] and =-=[BW94]-=-. Kannan and Meyers [KM93, KM96] describe a recursive algorithm for finding all highest scoring non-overlapping alignments in O(n 2 log 2 n) time and O(n 2 log n) space. Benson [B95] simplifies this a... |

28 | An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment Score - Kannan, Myers - 1996 |

21 | Sequence alignment with tandem duplication
- Benson
- 1997
(Show Context)
Citation Context ...s. Then, using a collection of statistical criteria, the k-tuple matches are used to detect the tandem repeats. This algorithm has already been used in conjunction with a sequence alignment algorithm =-=[B97]-=-. Multiple repeats occur frequently in both DNA and protein sequences. Some multiple repeats have been associated with human genetic diseases. For example, the triplet CGG is tandemly repeated 6 to 54... |

12 |
Fast parallel detection of squares in strings
- Apostolico
- 1992
(Show Context)
Citation Context ...of Computer Science, Bar-Ilan University, Ramat Gan 52900, Israel; email: sokold@macs.biu.ac.il; partially supported by NSF grants CCR-9305873. which reports all perfect tandem repeats and Apostolico =-=[Ap92]-=- describes an optimal speed-up parallel algorithm for the problem. Motivations for the exact repeat problem can be found in research in formal languages (see a survey in [ML85]). Repeats occur frequen... |

11 | Algorithms on Strings, Trees, and Sequences - Guseld - 1997 |

8 |
Linear time recognition of square free strings
- Main, Lorentz
- 1985
(Show Context)
Citation Context ... repeats and Apostolico [Ap92] describes an optimal speed-up parallel algorithm for the problem. Motivations for the exact repeat problem can be found in research in formal languages (see a survey in =-=[ML85]-=-). Repeats occur frequently in biological sequences, yet they are seldom exact. Hence, we focus our attention on approximately repeated patterns. An approximate single repeat is a nonempty string that... |

8 | An O(nlogn) algorithm for .nding all repetitions in a string - Main, Lorentz - 1984 |

7 | Computational Molecular Biology - Lesk, ed - 1988 |

6 | Approximate Periods of Strings
- Sim, Iliopoulos, et al.
(Show Context)
Citation Context ...of the repeat are approximate. The difficulties of defining approximate multiple repeats are addressed in Section 3, where we present a simple and precise definition. A similar notion is discussed in =-=[SPIS99]-=- where approximate periodicity of strings is defined as follows. Given two strings x and p, p is a t-approximate period of x if there exists a partition of x into disjoint blocks of substrings p 1 : :... |

6 | On the Distribution of K-tuple Matches for Sequence Homology: A Constant Time Exact Calculation of the Variance - Benson, Su - 1998 |

4 | Searching through sequence databases - Doolittle - 1990 |

3 |
GA: Space-efficient algorithm for finding best scoring non-overlapping alignments
- Benson
- 1995
(Show Context)
Citation Context ...5, S98, M92] and [BW94]. Kannan and Meyers [KM93, KM96] describe a recursive algorithm for finding all highest scoring non-overlapping alignments in O(n 2 log 2 n) time and O(n 2 log n) space. Benson =-=[B95]-=- simplifies this algorithm and reduces the space to O(n 2 ). Schmidt [S98] uses weighted grid graphs to find all locally optimal approximate repeats, improving the time to O(n 2 log n), with O(n 2 ) s... |

3 |
Caskey et al. An unstable triplet repeat in a gene related to Myotonic Dystrophy
- T
- 1992
(Show Context)
Citation Context ...gene. In patients with the Fragile X Syndrome, the pattern occurs more than 200 times. Kennedy disease and Myotonic Dystrophy(DM) are two other diseases that have been associated with triplet repeats =-=[C92]-=-. Another important application for finding multiple repeats in biological sequences is related to the multiple sequence alignment. Producing multiple alignments becomes very complicated when the 2 se... |

2 | An unstable triplet repeat in a gene related to Myotonic Dystrophy - Caskey - 1992 |

1 | On the distribution of k tuple matches for sequence homology: a constant time-exact calculation of the variance - Benson, Su - 1998 |

1 | An algorithm for locating a repeated region - Miller - 1992 |

1 | A space-ef#cient algorithm for #nding best scoring non-overlapping alignments. Theoret - Benson - 1995 |

1 | Tandem repeats #nder---a program to analyze DNA sequences - Benson - 1999 |

1 | An algorithm for locating a repeated region. (manuscript - Miller - 1992 |