## An alphabet-friendly FM-index (2004)

### Cached

### Download Links

- [www.mfn.unipmn.it]
- [people.unipmn.it]
- [www.dcc.uchile.cl]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc.SPIRE’04, LNCS 3246 |

Citations: | 43 - 19 self |

### BibTeX

@INPROCEEDINGS{Ferragina04analphabet-friendly,

author = {Paolo Ferragina and Giovanni Manzini and Veli Mäkinen and Gonzalo Navarro},

title = {An alphabet-friendly FM-index},

booktitle = {In Proc.SPIRE’04, LNCS 3246},

year = {2004},

pages = {150--160},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Σ. The size of the new index built on a string T [1, n] is bounded by nHk(T)+O � (n log log n) / log |Σ | n � bits, where Hk(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k ≤ α log |Σ | n and 0 < α < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P [1, p] as a substring of T takes O(p log |Σ|) time. Locating each pattern occurrence takes O(log |Σ | (log 2 n / log log n)) time. Reporting a text substring of length ℓ takes O((ℓ + log 2 n / log log n) log |Σ|) time. 1

### Citations

730 | Compression of individual sequences via variable-rate coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ... achieves O(p + occ) query time and uses O(nHk(T ) log ɛ n) + o(n) bits of storage. Thissdata structure exploits the interplay between the Burrows-Wheeler compression algorithm and the LZ78 algorithm =-=[22]-=-. Notice that this is first full-text index achieving o(n log n) bits of storage, possibly o(n) on highly compressible texts, and output sensitivity in the query execution. As we mentioned in the Intr... |

644 | Suffix arrays: a new method for on-line string searches
- Manber, Myers
- 1990
(Show Context)
Citation Context ...ext substring of length ℓ in O(ℓ + log 1+ɛ n) time. The design of the FM-index is based upon the relationship between the Burrows-Wheeler compression algorithm [1] and the suffix array data structure =-=[16, 9]-=-. It is therefore a sort of compressed suffix array that takes advantage of the compressibility of the indexed text in order to achieve space occupancy close to the Information Theoretic minimum. Inde... |

565 | A Block-sorting Lossless Data Compression Algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...t fixed in advance. It can display any text substring of length ℓ in O(ℓ + log 1+ɛ n) time. The design of the FM-index is based upon the relationship between the Burrows-Wheeler compression algorithm =-=[1]-=- and the suffix array data structure [16, 9]. It is therefore a sort of compressed suffix array that takes advantage of the compressibility of the indexed text in order to achieve space occupancy clos... |

193 | High-order entropy-compressed text indexes
- Grossi, Gupta, et al.
- 2003
(Show Context)
Citation Context ...it is worthwhile to investigate whether it is possible to build a more “alphabet-friendly” FM-index. In this paper we use the compression boosting technique [2, 7] and the wavelet tree data structure =-=[11]-=- to design a version of the FM-index which scales well with the size of the alphabet. Compression boosting partitions the BurrowsWheeler transformed text into contiguous areas in order to maximize the... |

191 | Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees and Multisets
- Raman, Raman, et al.
- 2002
(Show Context)
Citation Context ...en a binary sequence S[1, m] and b ∈ {0, 1}, consider the following operations: Rankb(S, i) computes the number of b’s in S[1, i], and Selectb(S, i) computes the position of the i-th b in S[1, i]. In =-=[19]-=- it has been proven the following: Theorem 3. Let S[1, m] be a binary sequence containing t occurrences of the digit 1. There exists a data structure (called FID) that supports Rankb(S, i) and Selectb... |

188 | Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Grossi, Vitter
(Show Context)
Citation Context ... that of locating occurrences and displaying text substrings has decreased. Recently, various compressed full-text indexes have been proposed in the literature achieving several time/space trade-offs =-=[13, 20, 18, 11, 12, 10]-=-. Among them, the one with the smallest space occupancy is the data structure described in [11] (Theorems 4.2 and 5.2) that achieves O(p log |Σ| + polylog(n)) time to count the pattern occurrences, O(... |

180 | Opportunistic Data Structures with Application
- Ferrragina, Manzini
- 2000
(Show Context)
Citation Context ...search for an arbitrary pattern as a substring of the indexed text. A self-index is a full-text index that encapsulates the indexed text T , without hence requiring its explicit storage. The FM-index =-=[3]-=- has been the first self-index in the literature to achieve a space occupancy close to the k-th order entropy of T —hereafter denoted by Hk(T ) (see Section 2.1). Precisely, the FM-index occupies at m... |

131 | An analysis of the Burrows-Wheeler transform
- MANZINI
(Show Context)
Citation Context ...will “group together” several occurrences of the same character. As a result, the transformed string T bwt contains long runs of identical characters and turns out to be highly compressible (see e.g. =-=[1, 17]-=- for details). Because of the special character #, when we sort the rows of MT we are essentially sorting the suffixes of T . Therefore there is a strong relation between the matrix MT and the suffix ... |

65 | An Experimental Study of an Opportunistic Index
- Ferrragina, Manzini
- 2001
(Show Context)
Citation Context ...it and ‘‘Piattaforma distribuita ad alte prestazioni’’, and by the Chilean Fondecyt Grant 1-020831.sover all k ≥ 0. These remarkable theoretical properties have been validated by experimental results =-=[4, 5]-=- and applications [14, 21]. The above bounds on the FM-index space occupancy and query time have been obtained assuming that the size of the input alphabet is a constant. Hidden in the big-O notation ... |

64 | Indexing text using the ZivLempel trie
- Navarro
- 2004
(Show Context)
Citation Context ... that of locating occurrences and displaying text substrings has decreased. Recently, various compressed full-text indexes have been proposed in the literature achieving several time/space trade-offs =-=[13, 20, 18, 11, 12, 10]-=-. Among them, the one with the smallest space occupancy is the data structure described in [11] (Theorems 4.2 and 5.2) that achieves O(p log |Σ| + polylog(n)) time to count the pattern occurrences, O(... |

50 | Succinct representations of LCP information and improvements in the compressed suffix arrays
- Sadakane
- 2002
(Show Context)
Citation Context ... that of locating occurrences and displaying text substrings has decreased. Recently, various compressed full-text indexes have been proposed in the literature achieving several time/space trade-offs =-=[13, 20, 18, 11, 12, 10]-=-. Among them, the one with the smallest space occupancy is the data structure described in [11] (Theorems 4.2 and 5.2) that achieves O(p log |Σ| + polylog(n)) time to count the pattern occurrences, O(... |

39 | Boosting textual compression in optimal linear time
- FERRAGINA, GIANCARLO, et al.
- 2005
(Show Context)
Citation Context ...th only a small penalty in the query time, it is worthwhile to investigate whether it is possible to build a more “alphabet-friendly” FM-index. In this paper we use the compression boosting technique =-=[2, 7]-=- and the wavelet tree data structure [11] to design a version of the FM-index which scales well with the size of the alphabet. Compression boosting partitions the BurrowsWheeler transformed text into ... |

39 |
New indices for text
- Gonnet, Baeza-Yates, et al.
- 1992
(Show Context)
Citation Context ...ext substring of length ℓ in O(ℓ + log 1+ɛ n) time. The design of the FM-index is based upon the relationship between the Burrows-Wheeler compression algorithm [1] and the suffix array data structure =-=[16, 9]-=-. It is therefore a sort of compressed suffix array that takes advantage of the compressibility of the indexed text in order to achieve space occupancy close to the Information Theoretic minimum. Inde... |

25 |
Annotating large genomes with exact word matches
- Healy, Thomas, et al.
- 2003
(Show Context)
Citation Context ...tribuita ad alte prestazioni’’, and by the Chilean Fondecyt Grant 1-020831.sover all k ≥ 0. These remarkable theoretical properties have been validated by experimental results [4, 5] and applications =-=[14, 21]-=-. The above bounds on the FM-index space occupancy and query time have been obtained assuming that the size of the input alphabet is a constant. Hidden in the big-O notation there is an exponential de... |

22 |
On compressing and indexing data
- Ferragina, Manzini
(Show Context)
Citation Context ...g 2 n/ log log n) steps until reaching r. Then we perform ℓ additional LF-steps to collect the text characters. The resulting complexity is O((ℓ + log 2 n/ log log n) |Σ|). We point out the existence =-=[6]-=- of a variant of the FM-index that achieves O(p + occ) query time and uses O(nHk(T ) log ɛ n) + o(n) bits of storage. Thissdata structure exploits the interplay between the Burrows-Wheeler compression... |

16 | Indexing huge genome sequences for solving various problems
- Sadakane, Shibuya
(Show Context)
Citation Context ...tribuita ad alte prestazioni’’, and by the Chilean Fondecyt Grant 1-020831.sover all k ≥ 0. These remarkable theoretical properties have been validated by experimental results [4, 5] and applications =-=[14, 21]-=-. The above bounds on the FM-index space occupancy and query time have been obtained assuming that the size of the input alphabet is a constant. Hidden in the big-O notation there is an exponential de... |

12 | An experimental study of a compressed index
- FERRAGINA, MANZINI
(Show Context)
Citation Context ...it and ‘‘Piattaforma distribuita ad alte prestazioni’’, and by the Chilean Fondecyt Grant 1-020831.sover all k ≥ 0. These remarkable theoretical properties have been validated by experimental results =-=[4, 5]-=- and applications [14, 21]. The above bounds on the FM-index space occupancy and query time have been obtained assuming that the size of the input alphabet is a constant. Hidden in the big-O notation ... |

12 | Compression boosting in optimal linear time using the Burrows-Wheeler Transform
- Ferragina, Manzini
- 2004
(Show Context)
Citation Context ...th only a small penalty in the query time, it is worthwhile to investigate whether it is possible to build a more “alphabet-friendly” FM-index. In this paper we use the compression boosting technique =-=[2, 7]-=- and the wavelet tree data structure [11] to design a version of the FM-index which scales well with the size of the alphabet. Compression boosting partitions the BurrowsWheeler transformed text into ... |

12 | New search algorithms and time/space tradeoffs for succinct suffix arrays
- Mäkinen, Navarro
- 2004
(Show Context)
Citation Context ...ze would depend on H0(T ) rather than Hk(T ). The partitioning of the text into areas is crucial to obtain the latter space bounds. A previous technique combining wavelet trees with text partitioning =-=[15]-=- takes each run of equal letters in T bwt as an area. It requires 2n(Hk log |Σ| + 1 + o(1)) bits of space and counts pattern occurrences in the optimal O(p) time. It would be interesting to retain the... |

11 |
then BurrowsWheeler: an alphabet-independent FM-index
- Grabowski, Mäkinen, et al.
(Show Context)
Citation Context |

8 |
Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms
- GIANCARLO, SCIORTINO
- 2003
(Show Context)
Citation Context ...nHk(T ) + O(log |Σ| log n ) bits and allows the computation of Occ(c, q) and T bwt [i] in O(log |Σ|) time. 2.4 Compression boosting The concept of compression boosting has been recently introduced in =-=[2, 7, 8]-=- opening the door to a new approach to data compression. The key idea is that one can take an algorithm whose performance can be bounded in terms of the 0-th order entropy and obtain, via the booster,... |

8 | When indexing equals compression: experiments on compressing suffix arrays and applications - Foschini, Grossi, et al. |