#### DMCA

## Abstract Better External Memory Suffix Array Construction

### Cached

### Download Links

- [www.mpi-sb.mpg.de]
- [www.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [domino.mpi-inf.mpg.de]
- [www.siam.org]
- [i10www.ira.uka.de]
- [algo2.iti.uka.de]
- [algo2.iti.kit.edu]

### Citations

823 | Suffix arrays: a new method for on-line string searches
- Manber, Myers
- 1993
(Show Context)
Citation Context ...r an array of indexes is a suffix array. As a tool of possible independent interest we present a systematic way to design, analyze, and implement pipelined algorithms. 1 Introduction The suffix array =-=[21, 12]-=-, a lexicographically sorted array of the suffixes of a string, has numerous applications, e.g., in string matching [21, 12], genome analysis [1] and text compression [6]. For example, one can use it ... |

791 | A block sorting lossless data compression algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...duction The suffix array [21, 12], a lexicographically sorted array of the suffixes of a string, has numerous applications, e.g., in string matching [21, 12], genome analysis [1] and text compression =-=[6]-=-. For example, one can use it as full text index: To find all occurrences of a pattern P in a text T do binary search in the suffix array of T , i.e., look for the interval of suffixes that have P as ... |

591 |
The input/output complexity of sorting and related problems
- Aggarwal, Jeffrey
- 1988
(Show Context)
Citation Context ...rk and can usually be implemented using bulk I/Os in the sense of [7] (we then need larger buffers b(v) for file nodes) whereas sorting requires many random accesses for information theoretic reasons =-=[2]-=-. Now we apply Theorem 2 to the doubling algorithm: Theorem 3. The doubling algorithm from Figure 1 can be implemented to run using sort(5n) ⌈log(1 + maxlcp)⌉ + O(scan(n)) I/Os. Proof. The following f... |

235 | Algorithms for parallel memory i: Two-level memories
- Vitter, Shriver
- 1994
(Show Context)
Citation Context ... obtain an optimal algorithm for external memory: Consider a machine with fast memory of size M and a secondary memory that can be accessed by I/Os to blocks of B consecutive words on each of D disks =-=[25]-=-. The DC3-algorithm [16] constructs a suffix array of a text T of length n using O(sort(n)) I/Os � n where sort(n) = O DB log M/B n M � is the number of I/Os needed for sorting the characters of T whi... |

212 | Linear work suffix array construction
- KÄRKKÄINEN, SANDERS, et al.
- 2006
(Show Context)
Citation Context ...e., look for the interval of suffixes that have P as a prefix. A lot of effort has been devoted to efficient construction of suffix arrays, culminating recently in three direct linear time algorithms =-=[16, 18, 19]-=-. Considering all this, suffix arrays can therefore be viewed as a concept of at least equal importance to suffix trees. One of the linear time algorithms [16] is very simple and can also be adapted 1... |

112 | Lineartime longest-common-prefix computation in suffix arrays and its applications - Kasai, Lee, et al. |

101 | Space efficient linear time construction of suffix arrays
- Ko, Aluru
(Show Context)
Citation Context ...e., look for the interval of suffixes that have P as a prefix. A lot of effort has been devoted to efficient construction of suffix arrays, culminating recently in three direct linear time algorithms =-=[16, 18, 19]-=-. Considering all this, suffix arrays can therefore be viewed as a concept of at least equal importance to suffix trees. One of the linear time algorithms [16] is very simple and can also be adapted 1... |

79 | Engineering a lightweight suffix array construction algorithm
- Manzini, Ferragina
- 2004
(Show Context)
Citation Context ... GBS-algorithm might be interesting for small inputs and fast machines with slow I/O. There has been considerable interest in space efficient internal memory algorithms for constructing suffix arrays =-=[22, 5]-=- and even more compact full-text indexes [20, 13, 14]. We view this as an indication that internal memory is too expensive for the big suffix arrays one would like to build. Going to external memory c... |

68 | Linear-time construction of suffix arrays
- Kim, Sim, et al.
(Show Context)
Citation Context ...e., look for the interval of suffixes that have P as a prefix. A lot of effort has been devoted to efficient construction of suffix arrays, culminating recently in three direct linear time algorithms =-=[16, 18, 19]-=-. Considering all this, suffix arrays can therefore be viewed as a concept of at least equal importance to suffix trees. One of the linear time algorithms [16] is very simple and can also be adapted 1... |

59 | The enhanced suffix array and its applications to genome analysis
- Abouelhoda, Kurtz, et al.
- 2002
(Show Context)
Citation Context ...lined algorithms. 1 Introduction The suffix array [21, 12], a lexicographically sorted array of the suffixes of a string, has numerous applications, e.g., in string matching [21, 12], genome analysis =-=[1]-=- and text compression [6]. For example, one can use it as full text index: To find all occurrences of a pattern P in a text T do binary search in the suffix array of T , i.e., look for the interval of... |

59 | Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
- Hon, Sadakane, et al.
(Show Context)
Citation Context ...puts and fast machines with slow I/O. There has been considerable interest in space efficient internal memory algorithms for constructing suffix arrays [22, 5] and even more compact full-text indexes =-=[20, 13, 14]-=-. We view this as an indication that internal memory is too expensive for the big suffix arrays one would like to build. Going to external memory can be viewed as an alternative and more 1 There is al... |

56 |
On the sorting-complexity of suffix tree construction
- Farach-Colton, Ferragina, et al.
(Show Context)
Citation Context ...ow even larger suffix arrays could be build. The appendix contains further details that will be part of the full paper. More Related Work The first I/O optimal algorithm for suffix array construction =-=[11]-=- is based on suffix tree construction and introduced the basic divide-and-conquer approach that is also used bysDC3. However, the algorithm from [11] is so complicated that an implementation looks not... |

41 |
New indices for text
- Gonnet, Baeza-Yates, et al.
- 1992
(Show Context)
Citation Context ...r an array of indexes is a suffix array. As a tool of possible independent interest we present a systematic way to design, analyze, and implement pipelined algorithms. 1 Introduction The suffix array =-=[21, 12]-=-, a lexicographically sorted array of the suffixes of a string, has numerous applications, e.g., in string matching [21, 12], genome analysis [1] and text compression [6]. For example, one can use it ... |

34 | Fast lightweight suffix array construction and checking
- Burkhardt, Kärkkäinen
(Show Context)
Citation Context ... GBS-algorithm might be interesting for small inputs and fast machines with slow I/O. There has been considerable interest in space efficient internal memory algorithms for constructing suffix arrays =-=[22, 5]-=- and even more compact full-text indexes [20, 13, 14]. We view this as an indication that internal memory is too expensive for the big suffix arrays one would like to build. Going to external memory c... |

31 |
On sorting strings in external memory
- Arge, Ferragina, et al.
- 1997
(Show Context)
Citation Context ...dentation. We extend set notation to sequences in the obvious way. For example [i : i is prime] = 〈2, 3, 5, 7, 11, 13, . . .〉 in that order. 2 Overview: In Section 2 we present the doubling algorithm =-=[3, 7]-=- for suffix array construction that has I/O complexity O(sort(n log maxlcp)). This algorithm sorts strings of size 2k in the k-th iteration. Our variant already yields some small optimization opportun... |

31 | Implementing I/O-efficient data structures using TPIE
- Arge, Procopiuc, et al.
- 2002
(Show Context)
Citation Context ...algorithms have comparable speed using external memory. Pipelining to reduce I/Os is well known technique in executing database queries [24]. However, previous algorithm libraries for external memory =-=[4, 8]-=- do not support it. We decided quite early in the design of our library Stxxl [9] that we wanted to remove this deficit. Since suffix array construction can profit immensely from pipelining and since ... |

30 | Ferragina P., A theoretical and experimental study on the construction of suffix arrays in external memory
- Crauser
(Show Context)
Citation Context ...e favorable constant factors and can be implemented to work well with external memory for practical inputs. In contrast, the only previous external memory implementations of suffix array construction =-=[7]-=- are not only asymptotically suboptimal but also so slow that measurements could only be done for small inputs and artificially reduced internal memory size. The main objective of the present paper is... |

28 | Asynchronous parallel disk sorting
- Dementiev, Sanders
- 2003
(Show Context)
Citation Context ...quite expensive with respect to internal work. Our system (multiple modern disks controlled by a performance oriented library [9]) supports disk I/O at a speed up to one third of its memory bandwidth =-=[10]-=- so that the high internal cost makes the GBS-algorithm even more questionable for the present study. Nevertheless it should be kept in mind that the GBS-algorithm might be interesting for small input... |

27 | A space and time efficient algorithm for constructing compressed suffix arrays
- Lam, Sadakane, et al.
- 2002
(Show Context)
Citation Context ...puts and fast machines with slow I/O. There has been considerable interest in space efficient internal memory algorithms for constructing suffix arrays [22, 5] and even more compact full-text indexes =-=[20, 13, 14]-=-. We view this as an indication that internal memory is too expensive for the big suffix arrays one would like to build. Going to external memory can be viewed as an alternative and more 1 There is al... |

23 | Indexing huge genome sequences for solving various problems
- Sadakane, Shibuya
- 2001
(Show Context)
Citation Context ...this step is made, space consumption is less of an issue because disk space is two orders of magnitude cheaper than RAM. The biggest suffix array computations we are aware of are for the human genome =-=[23, 20]-=-. One [20] computes the compressed suffix array on PC with 3GByte of memory in 21 h. Compressed suffix arrays work well in this case (they need only 2 GByte of space) because the small alphabet size p... |

19 | Constructing compressed suffix arrays with large alphabets
- Hon, Lam, et al.
- 2003
(Show Context)
Citation Context ...puts and fast machines with slow I/O. There has been considerable interest in space efficient internal memory algorithms for constructing suffix arrays [22, 5] and even more compact full-text indexes =-=[20, 13, 14]-=-. We view this as an indication that internal memory is too expensive for the big suffix arrays one would like to build. Going to external memory can be viewed as an alternative and more 1 There is al... |

17 |
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
- Cheung, Yu, et al.
(Show Context)
Citation Context ...ithm might be interesting for small inputs and fast machines with slow I/O. There is a very recent study of external suffix tree construction for the case that the text itself fits in internal memory =-=[7]-=-. The proposed algorithm has quadratic worst case complexity. Experiments are reported for up to 224 · 10 6 characters (human chromosome 1) and only optimistic estimates of I/O performance rather than... |

6 | LEDA-SM a platform for secondary memory computations
- Crauser, Mehlhorn
- 1998
(Show Context)
Citation Context ...algorithms have comparable speed using external memory. Pipelining to reduce I/Os is well known technique in executing database queries [24]. However, previous algorithm libraries for external memory =-=[4, 8]-=- do not support it. We decided quite early in the design of our library Stxxl [9] that we wanted to remove this deficit. Since suffix array construction can profit immensely from pipelining and since ... |

1 |
The stxxl library. documentation and download at http://www.mpi-sb.mpg.de/ ~rdementi/stxxl.html
- Dementiev
(Show Context)
Citation Context ...m needs a local suffix array search for each suffix scanned so that it is quite expensive with respect to internal work. Our system (multiple modern disks controlled by a performance oriented library =-=[9]-=-) supports disk I/O at a speed up to one third of its memory bandwidth [10] so that the high internal cost makes the GBS-algorithm even more questionable for the present study. Nevertheless it should ... |

1 |
Algorithms for Memory Hierarchies, volume 2625
- Kärkkäinen
- 2003
(Show Context)
Citation Context ...rs refer to line numbers in Figure 3. The edge weights are sums over the whole execution with N = n log lcp ↔ rithm. A slightly different algorithm with the same asymptotic complexity is described in =-=[15]-=-. Function doubling + discarding(T ) S:= [((T [i], T [i + 1]), i) : i ∈ [0, n)] (1) sort S (2) U:= name(S) // undiscarded (3) P := 〈〉 // partially discarded F := 〈〉 // fully discarded for k := 1 to ⌈l... |