## Dynamic Entropy-Compressed Sequences and Full-Text Indexes

### Cached

### Download Links

Citations: | 49 - 26 self |

### BibTeX

@MISC{Mäkinen_dynamicentropy-compressed,

author = {Veli Mäkinen and Gonzalo Navarro},

title = {Dynamic Entropy-Compressed Sequences and Full-Text Indexes},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

644 | Suffix arrays: a new method for on-line string searches
- Manber, Myers
- 1990
(Show Context)
Citation Context ...veral classical full-text indexesrequiring O(n log n) bits of space which can answer counting queries in O(m log oe)time (like suffix trees [Apostolico 1985]) or O(m + log n) time (like suffix arrays[=-=Manber and Myers 1993-=-]). Both locate each occurrence in constant time once the counting is done. Similar complexities are obtained with modern compressed datastructures [Ferragina and Manzini 2000; Grossi et al. 2003; Fer... |

617 |
Text Compression
- Bell, Cleary, et al.
- 1990
(Show Context)
Citation Context ...tisfy two properties: (i) |δ(x)| = log x + o(log x); (ii) we can univocally distinguish x and D from δ(x)D, being D any bit sequence. A well-known encoding satisfying the above properties is Elias’ δ =-=[10,3]-=-. To represent x, let l = ⌈log(x + 1)⌉ be the number of bits necessary to encode x, and let ll = ⌈log(l + 1)⌉ be the number of bits necessary to code l. Then δ(x) is formed by three parts: (a) ll 0-bi... |

565 | A Block-sorting Lossless Data Compression Algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...representation into the self indexof [Chan et al. 2004] yields h-th order compression stems from the fact that thesequence we are representing is the Burrows-Wheeler transform of the text collection [=-=Burrows and Wheeler 1994-=-; Manzini 2001]. This is a striking result we provein this paper. For several years, much effort has been spent in designing sophisticated (static) data structures on top of the plain wavelet tree so ... |

485 |
Information retrieval: Data structures and algorithms. Eaglewood Cliffs
- Frakes, Baeza-Yatex
- 1992
(Show Context)
Citation Context ...xts that do not fit in main memory, even compressed. In practice, one of the best algorithms for this problem[Crauser and Ferragina 2002] is still a multi-pass technique with I/O complexity O(n2/M ) [=-=Gonnet et al. 1992-=-], where M is the maximum text size that can beindexed in main memory. The use of our compressed construction technique on main memory translates into much larger values of M , and thus fewer passes o... |

348 |
Universal codeword sets and representations of the integers
- Elias
- 1975
(Show Context)
Citation Context ...tisfy two properties: (i) |δ(x)| = log x + o(log x); (ii) we can univocally distinguish x and D from δ(x)D, being D any bit sequence. A well-known encoding satisfying the above properties is Elias’ δ =-=[10,3]-=-. To represent x, let l = ⌈log(x + 1)⌉ be the number of bits necessary to encode x, and let ll = ⌈log(l + 1)⌉ be the number of bits necessary to code l. Then δ(x) is formed by three parts: (a) ll 0-bi... |

193 | High-order entropy-compressed text indexes
- Grossi, Gupta, et al.
- 2003
(Show Context)
Citation Context ...1, oe],where operations rank and select generalize to ranka(A, i), counting the occur-rences of symbol a in A[1, i], and selecta(A, j), giving the position of the j-th a in A. By using wavelet trees [=-=Grossi et al. 2003-=-] we achieve nH0 + o(n log oe) bits ofspace and O(log n log oe) time complexities. This space has been previously achievedonly for static data structures, with query time O(dlog oe/ log log ne) time f... |

191 | Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees and Multisets
- Raman, Raman, et al.
- 2002
(Show Context)
Citation Context ...operations, over a data structure that requires nH0+o(n) bits of space, where 0 ≤ H0 ≤ 1 is the binary zero-order entropy of A. This space has been previously achieved only for static data structures =-=[14]-=-, with constant time for rank and select but no support for updates. Ours is the first entropy-bound dynamic data structure answering rank and select queries. Moreover, our result works under weaker a... |

188 | Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Grossi, Vitter
(Show Context)
Citation Context ...ly difference is that bit vectors are represented in compressed form in the leaves of the binary tree. We note that gap encoding has also been used to achieve zero-order entropy in static schemes. In =-=[19]-=- they explore the idea of inserting some information into the encoding so as to permit solving rank and select queries in logarithmic time via binary searches. In [20] they improve this result and rea... |

180 | Opportunistic Data Structures with Application
- Ferrragina, Manzini
- 2000
(Show Context)
Citation Context ...g n) time (like suffix arrays[Manber and Myers 1993]). Both locate each occurrence in constant time once the counting is done. Similar complexities are obtained with modern compressed datastructures [=-=Ferragina and Manzini 2000-=-; Grossi et al. 2003; Ferragina et al. 2007], requiring space nHh + o(n log oe) bits (for some small h), where Hh <= log oe is the h-th order empirical entropy of T .3 These indexes are often called c... |

172 | Compressed full-text indexes
- NAVARRO, MÄKINEN
- 2007
(Show Context)
Citation Context ...ed into a single wavelet tree and the same space would have been achieved. This would greatly simplify the original arrangement and possibly expose the deep relationship with the BWT-based approaches =-=[28]-=-. 41Interestingly, in [18] they find out that, if they use gap encoding over the successive values along a column, and they then concatenate all the columns, the total space is O(nHh) without any tab... |

131 | An analysis of the Burrows-Wheeler transform
- MANZINI
(Show Context)
Citation Context ...bits sequence representation into the self index of [7] yields h-th order compression stems from the fact that the sequence we are representing is the Burrows-Wheeler transform of the text collection =-=[5,27]-=-. This is a striking result we prove in this paper. For several years, much effort has been spent in designing sophisticated (static) data structures on top of the plain wavelet tree so as to reduce i... |

119 |
Indexing compressed text
- Ferragina, Manzini
(Show Context)
Citation Context ...nical convenience. If this is an issue, the sequences can be handled in reverse order to obtain results on the more standard definition. It is anyway known that both definitions do not differ by much =-=[14]-=-. 6A more general problem called Searchable Partial Sums with Indels includes also the following operations: • insert(A, i, x) inserts x between ai−1 and ai. • delete(A, i) deletes ai from the sequen... |

115 |
The myriad virtues of subword trees
- Apostolico
- 1985
(Show Context)
Citation Context ...P in T ; (b) locate those occ positions in T . There are several classical full-text indexesrequiring O(n log n) bits of space which can answer counting queries in O(m log oe)time (like suffix trees [=-=Apostolico 1985-=-]) or O(m + log n) time (like suffix arrays[Manber and Myers 1993]). Both locate each occurrence in constant time once the counting is done. Similar complexities are obtained with modern compressed da... |

110 | Compressed representations of sequences and full-text indexes
- Ferragina, Manzini, et al.
(Show Context)
Citation Context ...ime complexities. This space has been previously achievedonly for static data structures, with query time O(dlog oe/ log log ne) time for rankand select but no support for updates [Raman et al. 2002; =-=Ferragina et al. 2007-=-].By using multiary wavelet trees [Ferragina et al. 2007], we can reduce the query times to O( 1ffl log ndlog oe/ log log ne) in exchange for increasing the update times to O( 1ffl log1+ffl n/ log log... |

64 | Indexing text using the ZivLempel trie
- Navarro
- 2004
(Show Context)
Citation Context ... construction (the same as the final structure). This is the first construction algorithm for a FM-index [6] variant, whose working space depends on the entropy. For another self-index called LZindex =-=[12]-=-, there is a recent entropy-bound construction algorithm [2]. 2 Definitions To simplify notation, we ignore roundings. When refering to number of bits, we use simply log n to refer to ⌊(log n) + 1⌋. T... |

60 | Rank/select operations on large alphabets: a tool for text indexing - Golynski, Munro, et al. - 2006 |

53 | Succinct suffix arrays based on run-length encoding
- MÄKINEN, NAVARRO
(Show Context)
Citation Context ...we prove in this paper. For several years, much effort has been spent in designing sophisticated (static) data structures on top of the plain wavelet tree so as to reduce its nH0-bit size to nHh bits =-=[17,15,24]-=-. In this paper we show that this is automatically achieved by the original wavelet tree without any further effort! Thus, as a byproduct, we obtain a significant simplification in the design of stati... |

50 | Breaking a time-and-space barrier in constructing full-text indices - Hon, Sadakane, et al. - 2003 |

43 | When indexing equals compression: experiments with compressing suffix arrays and applications
- Grossi, Gupta, et al.
- 2004
(Show Context)
Citation Context ...ee and the same space would have been achieved. This would greatly simplify the original arrangement and possibly expose the deep relationship with the BWT-based approaches [28]. 41Interestingly, in =-=[18]-=- they find out that, if they use gap encoding over the successive values along a column, and they then concatenate all the columns, the total space is O(nHh) without any table partitioning as well. Bo... |

36 |
Optimal algorithms for list indexing and subset rank
- Dietz
- 1989
(Show Context)
Citation Context ...led in O(b) time, for anyparameter b = \Omega (log n/ log log n)2. Hence, they provide a solution to the Dynamic Bit Vector with Indels problem. Their main structure is a weight-balancedB-tree (WBB) [=-=Dietz 1989-=-; Raman et al. 2001]. Our goal is to obtain nH0 + o(n) bits of space and O(log n) time for all theoperations above. We build over a simplified version of their structure, which uses standard balanced ... |

33 | A simple storage scheme for strings achieving entropy bounds
- Ferragina, Venturini
- 2007
(Show Context)
Citation Context ...n the other hand, we have achieved zero-order dynamic representations for thedata sequence itself. There exist static high-order representations [Sadakane and Grossi 2006; Gonz'alez and Navarro 2006; =-=Ferragina and Venturini 2007-=-] that can becomposed with extra data for computing rank and select. If dynamized, such compressed representations would immediately yield high-order dynamic compressedsequences. Note that we have ind... |

32 | Rank and select revisited and extended - Mäkinen, Navarro |

26 | Statistical encoding of succinct data structures - GONZÁLEZ, NAVARRO - 2006 |

25 | Succinct dynamic data structures
- Raman, Raman, et al.
- 2001
(Show Context)
Citation Context ...time, for anyparameter b = \Omega (log n/ log log n)2. Hence, they provide a solution to the Dynamic Bit Vector with Indels problem. Their main structure is a weight-balancedB-tree (WBB) [Dietz 1989; =-=Raman et al. 2001-=-]. Our goal is to obtain nH0 + o(n) bits of space and O(log n) time for all theoperations above. We build over a simplified version of their structure, which uses standard balanced trees and achieves ... |

23 |
Compressed Indexes for Dynamic Text Collections
- Chan, Hon, et al.
(Show Context)
Citation Context ... bitsof space (with constant 5 at least), O(m log3 n) counting time, O(log n) amortizedinsertion time per character, and O(log2 n) amortized deletion time per character.A newer one [Chan et al. 2004; =-=Chan et al. 2007-=-] requires O(oen) bits of space, O(m log n) counting time, O(log2 n) locating time per occurrence, and O(oe log n)insertion/deletion time per character. As a plus, we obtain an O(n log n log oe) time ... |

23 | Compressed data structures: Dictionaries and data-aware measures
- Gupta, Hon, et al.
- 2006
(Show Context)
Citation Context ...er entropy in static schemes. In [19] they explore the idea of inserting some information into the encoding so as to permit solving rank and select queries in logarithmic time via binary searches. In =-=[20]-=- they improve this result and reach time o((log log n) 2 ), close to the lower bound on the predecessor problem when the space depends on the number of bits set and only logarithmically on the total n... |

21 | Compact representations of ordered sets
- BLANDFORD, BLELLOCH
- 2004
(Show Context)
Citation Context ...chieve a representation that uses nH0 + o(n) bits of space, where 0 < H0 ≤ 1 is the empirical zeroorder entropy of the sequence. This is an improvement over an O(nH0) result by Blandford and Blelloch =-=[4]-=- since, although their results are more general, they do not achieve constant 1 multiplying the entropy term in the space complexity. The comparison of time complexities against partial sums with k = ... |

21 | Theoretical and experimental study on the construction of suffix arrays in external memory
- Crauser, Ferragina
(Show Context)
Citation Context ...ceable 43practical impact on difficult real-life problems such as building indexes for texts that do not fit in main memory, even compressed. In practice, one of the best algorithms for this problem =-=[8]-=- is still a multi-pass technique with I/O complexity O(n 2 /M) [16], where M is the maximum text size that can be indexed in main memory. The use of our compressed construction technique on main memor... |

21 | Constructing compressed suffix arrays with large alphabets
- HON, LAM, et al.
- 2003
(Show Context)
Citation Context ...ble Partial Sums problem consists in maintaining asequence A of nonnegative integers a1 . . . an, each of k bits, supporting queries onthe prefix sums and limited updates on the values. An extension [=-=Hon et al. 2003-=-b] called Searchable Partial Sums with Indels allows insertions and deletionsof values as well. The restricted version where k = 1, and thus the numbers are actually bits,is called the Dynamic Bit Vec... |

20 |
Succinct data structures for searchable partial sums
- Hon, Sung
- 2003
(Show Context)
Citation Context ...ble Partial Sums problem consists of maintaining a sequence A of nonnegative integers a1 . . .an, each of k bits, supporting queries on the prefix sums and limited updates on the values. An extension =-=[22]-=- called Searchable Partial Sums with Indels allows insertions and deletions of values as well. The restricted version where k = 1, and thus the numbers are actually bits, is called the Dynamic Bit Vec... |

20 | Tight bounds for the partial-sums problem
- Patrascu, Demaine
- 2004
(Show Context)
Citation Context ...es achieved for this problem are worst-case O(log n/ log(w/ffi)), where w is the machine word size under the RAM model of computation, and updatesthat add/subtract a number of ffi bits are permitted [=-=Patrascu and Demaine 2004-=-].They show that this complexity is optimal, but insertions and deletions are not considered.In this paper we extend this result by achieving kn + o(kn) bits of space and O(log n) worst case time comp... |

19 | The myriad virtues of wavelet trees
- FERRAGINA, GIANCARLO, et al.
(Show Context)
Citation Context ... as well. Both findings share the same source: the sum of zero-order entropies of the table cells, no matter the order, adds up to nHh. Finally, it is interesting to point out that, in a recent paper =-=[11]-=-, the possibility of achieving h-th order compression when applying wavelet trees over the BWT is explored (among many other results), yet they resort to run-length compression to achieve this. Once m... |

18 | Low redundancy in dictionaries with O(1) worst case lookup time
- Pagh
- 1999
(Show Context)
Citation Context ... compressed B j t∑ ⌈ log i=1 ( b l j )⌉ i ≤ blk is ( b · t log l j 1 + . . .l j t ) + t ≤ log ( ) j |B | l j + t ≤ |B j |H0(B j ) + t where all the inequalities hold by simple combinatorial arguments =-=[29]-=- and have been reviewed in Section 3.1. Note that those Bj bit vectors are precisely those that would result if we built the wavelet tree just for Lj . According to Lemma 1, adding up those |Bj|H0(B j... |

16 | Compressed index for a dynamic collection of texts
- Chan, Hon, et al.
(Show Context)
Citation Context ...ny text substring withoutaccessing T . A dynamic self-index permits managing a collection of texts andinserting/deleting texts to/from the collection. There exist dynamic self-indexes by Chan et al. [=-=Chan et al. 2004-=-; Chan et al.2007]. One version requires O(oen) bits of space, and it can count the number ofoccurrences of a pattern of length m in time O(m log n). Insertions and deletionsrequire O(oe log n) and O(... |

14 | Space-efficient construction of LZ-index
- Arroyuelo, Navarro
- 2005
(Show Context)
Citation Context ... the final structure). Previous construction algorithms within entropy space achieve O(nH0) bits of space and O(n logn) time [21], or O(nHh) bits of space (with constant larger than 4) and O(σn) time =-=[2]-=-. Several other compressed indexes can be obtained using our algorithm. Moreover, it is very easy to obtain the Burrows-Wheeler transform of T from the index we build, within the same O(n lognlog σ) t... |

14 | Fast BWT in small space by blockwise suffix sorting
- Kärkkäinen
- 2007
(Show Context)
Citation Context ...one by one. This takes O(n lognlog σ) time, just as the construction, and gives an algorithm to build the BWT of a text within entropy bounds. The best result we know of, in terms of space complexity =-=[23]-=-, achieves O(n log 2 n) time (O(n log n) on average) using O(n) bits in addition to the n log σ bits of the text. 8 Final Remarks We have introduced a technique to maintain a dynamic bit sequence of l... |

12 | Compression boosting in optimal linear time using the Burrows-Wheeler Transform
- Ferragina, Manzini
- 2004
(Show Context)
Citation Context ...that the base technique they build on naturally achieves the result without need of any further engineering. Our finding also impact several other works that use this technique in one form or ahother =-=[17,13,11]-=-. Still, the results in [24,15] have practical value. In their actual implementation (http://pizzachili.dcc.uchile.cl orhttp://pizzachili.di.unipi.it), zero-order entropy is achieved by using uncompre... |

6 | Preliminary version - appear - 2008 |

5 | Rank and select revisited and extended, Theoretical Computer Science 387 - Mäkinen, Navarro - 2007 |

4 | Dynamic rank/select dictionaries with applications to XML indexing - Gupta, Hon, et al. - 2006 |

1 |
Dynamic Entropy-Compressed Sequences * 39
- Ferragina, Giancarlo, et al.
- 2006
(Show Context)
Citation Context ...g as well. Bothfindings share the same source: the sum of zero-order entropies of the table cells, no matter the order, adds up to nHh.Finally, it is interesting to point out that, in a recent paper [=-=Ferragina et al. 2006-=-], the possibility of achieving h-th order compression when applying wavelettrees over the BWT is explored (among many other results), yet they resort to run-length compression to achieve this. Once m... |

1 |
Static and dynamic rank-select dictionaries for run-length encoded texts
- Lee, Park
- 2007
(Show Context)
Citation Context ...t hold, and h-th order entropy would not be achieved if just the simple wavelet tree ofthe BWT was used. Now, our findings suggest that implementing the technique of 11Indeed, in a very recent paper [=-=Lee and Park 2007-=-], they build over our scheme to achieve n log oe(1 + o(1)) bits of space and O(log n(1 + log oelog log n )) time. Using the same space, another very recent result [Gupta et al. 2006b] achieves O( 1ff... |

1 | Month 20YY. Entropy-Compressed Sequences · 39 - No - 2007 |