## Space-Economical Algorithms for Finding Maximal Unique Matches (2002)

Venue: | In Proc. 13th Annual Symp. Combinatorial Pattern Matching (CPM |

Citations: | 13 - 2 self |

### BibTeX

@INPROCEEDINGS{Hon02space-economicalalgorithms,

author = {Wing-kai Hon and Kunihiko Sadakane},

title = {Space-Economical Algorithms for Finding Maximal Unique Matches},

booktitle = {In Proc. 13th Annual Symp. Combinatorial Pattern Matching (CPM},

year = {2002},

pages = {144--152},

publisher = {Springer}

}

### OpenURL

### Abstract

We show space-economical algorithms for finding maximal unique matches (MUM's) between two strings which are important in large scale genome sequence alignment problems. Our algorithms require only O(n) bits (O(n/log n) words) where n is the total length of the strings. We propose three algorithms for different inputs: In case the input is only the strings, their compressed suffix array, or their compressed suffix tree. Their time complexities are O(n log n), O(n log^ε n) and O(n) respectively, where ε is any constant between 0 and 1. We also show an algorithm to construct the compressed suffix tree from the compressed suffix array using O(n log^ε n) time and O(n) bits space.

### Citations

970 |
Algorithms on strings, trees and sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...ay using O(n log # n) time and O(n) bits space. 1 Introduction The su#x tree is a quite useful data structure for solving string problems. Many problems can be e#ciently solved by using the su#x tree =-=[5]-=-. However the problem of using the su#x tree is its size. It is said that the su#x tree occupies about 17n bytes for a string of length n. Although a space-e#cient representation of the su#x tree [7] ... |

363 |
Universal codeword sets and representations of the integers
- Elias
- 1975
(Show Context)
Citation Context ...er et al.[1], and the properties of MUM. In Section 4 we propose new algorithms to find MUM's. Section 5 shows concluding remarks. 2 Preliminaries 2.1 Su#x Trees and Su#x Arrays Let T [1..n] = T [1]T =-=[2]-=- T [n] be a string of length n on an alphabet A. Assume that the alphabet size |A| is constant. The j-th su#x of T is defined as T [j..n] = T [j]T [j + 1] . . . T [n] and expressed by T j . A substrin... |

191 | Opportunistic data structures with applications
- Ferragina, Manzini
- 2000
(Show Context)
Citation Context ...to develop space-economical alternatives to the su#x tree. Recently many such data structures were proposed, for example space-e#cient su#x trees [10], the compressed su#x array [4, 11], the FM-index =-=[3]-=-, data structures for bottom-up traversal of the su#x tree [6], and data structures for longest common prefixes [12]. However none of them has the same functions as the su#x tree. In this paper we con... |

167 | Alignment of whole genomes
- Delcher, Kasif, et al.
- 1999
(Show Context)
Citation Context ...es (MUM's) between two strings A and B. An MUM is a substring that appear once in both A and B and is not contained in any longer such substring. The MUM's are used in the algorithm of Delcher et al. =-=[1]-=- for aligning two long genome sequences. Details are described in Section 3. Although this problem can be solved in linear time by using the su#x tree of the strings A and B, it is not space-e#cient. ... |

144 | Succinct representation of balanced parentheses, static trees and planar graphs
- Munro, Raman
- 1997
(Show Context)
Citation Context ...by a factor of O(log n) because the su#x tree requires O(n) pointers, or equivalently O(n log n) bits. Our data structure include the compressed su#x array (CSA), parentheses representation of a tree =-=[9]-=- and the data structures for longest common prefixes (Hgt array) [12]. Note that these data structures do not store su#x links in a su#x tree. Therefore we may not be able to solve the problem e#cient... |

80 | Linear-time longestcommon-prefix computation in suffix arrays and its applications
- Kasai, Lee, et al.
(Show Context)
Citation Context ...ently many such data structures were proposed, for example space-e#cient su#x trees [10], the compressed su#x array [4, 11], the FM-index [3], data structures for bottom-up traversal of the su#x tree =-=[6]-=-, and data structures for longest common prefixes [12]. However none of them has the same functions as the su#x tree. In this paper we consider the problem of finding maximal unique matches (MUM's) be... |

51 | Succinct representations of lcp information and improvements in the compressed sux arrays
- Sadakane
- 2002
(Show Context)
Citation Context ...ample space-e#cient su#x trees [10], the compressed su#x array [4, 11], the FM-index [3], data structures for bottom-up traversal of the su#x tree [6], and data structures for longest common prefixes =-=[12]-=-. However none of them has the same functions as the su#x tree. In this paper we consider the problem of finding maximal unique matches (MUM's) between two strings A and B. An MUM is a substring that ... |

15 |
Compressed sux arrays and sux trees with applications to text indexing and string matching
- Grossi, Vitter
- 2000
(Show Context)
Citation Context ...efore it is important to develop space-economical alternatives to the su#x tree. Recently many such data structures were proposed, for example space-e#cient su#x trees [10], the compressed su#x array =-=[4, 11]-=-, the FM-index [3], data structures for bottom-up traversal of the su#x tree [6], and data structures for longest common prefixes [12]. However none of them has the same functions as the su#x tree. In... |

13 |
Reducing the Space Requirement of Sux Trees
- Kurtz
- 1998
(Show Context)
Citation Context ... [5]. However the problem of using the su#x tree is its size. It is said that the su#x tree occupies about 17n bytes for a string of length n. Although a space-e#cient representation of the su#x tree =-=[7]-=- has been proposed, it still occupies more than 10n bytes. Therefore it is di#cult to apply the su#x tree to solve large scale problems. The problem is severe in treating genome scale strings. For exa... |

8 |
Compressed text databases with ecient query algorithms based on the compressed sux array
- Sadakane
- 2000
(Show Context)
Citation Context ...efore it is important to develop space-economical alternatives to the su#x tree. Recently many such data structures were proposed, for example space-e#cient su#x trees [10], the compressed su#x array =-=[4, 11]-=-, the FM-index [3], data structures for bottom-up traversal of the su#x tree [6], and data structures for longest common prefixes [12]. However none of them has the same functions as the su#x tree. In... |

6 |
Space ecient sux trees
- Munro, Raman, et al.
- 2001
(Show Context)
Citation Context ...es, which is not realistic. Therefore it is important to develop space-economical alternatives to the su#x tree. Recently many such data structures were proposed, for example space-e#cient su#x trees =-=[10]-=-, the compressed su#x array [4, 11], the FM-index [3], data structures for bottom-up traversal of the su#x tree [6], and data structures for longest common prefixes [12]. However none of them has the ... |