Results 1 
4 of
4
Implementing the LZindex: Theory versus practice
"... The LZindex is a theoretical proposal of a lightweight data structure for text indexing, based on the ZivLempel trie. If a text of u characters over an alphabet of size σ is compressible to n symbols using the LZ78 algorithm, then the LZindex takes 4n log 2 n(1 + o(1)) bits of space (that is, 4 t ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
The LZindex is a theoretical proposal of a lightweight data structure for text indexing, based on the ZivLempel trie. If a text of u characters over an alphabet of size σ is compressible to n symbols using the LZ78 algorithm, then the LZindex takes 4n log 2 n(1 + o(1)) bits of space (that is, 4 times the entropy of the text) and reports the R occurrences of a pattern of length m in worst case time O(m 3 log σ + (m + R) log n). In this paper we face the challenge of obtaining a practical implementation of the LZindex, which is not at all straightforward from the theoretical proposal. We end up with a prototype that takes the promised space and has average search time O(σm log u+ √ uR). This prototype is shown to be faster than other competing approaches when we take into account the time to report the positions or text contexts of the occurrences found. We show in detail the process of implementing the index, which involves interesting lessons of theory versus practice.
Stronger LempelZiv Based Compressed Text Indexing
, 2008
"... Given a text T[1..u] over an alphabet of size σ, the fulltext search problem consists in finding the occ occurrences of a given pattern P[1..m] in T. In indexed text searching we build an index on T to improve the search time, yet increasing the space requirement. The current trend in indexed text ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Given a text T[1..u] over an alphabet of size σ, the fulltext search problem consists in finding the occ occurrences of a given pattern P[1..m] in T. In indexed text searching we build an index on T to improve the search time, yet increasing the space requirement. The current trend in indexed text searching is that of compressed fulltext selfindices, which replace the text with a more spaceefficient representation of it, at the same time providing indexed access to the text. Thus, we can provide efficient access within compressed space. The LZindex of Navarro is a compressed fulltext selfindex able to represent T using 4uHk(T) + o(u log σ) bits of space, where Hk(T) denotes the kth order empirical entropy of T, for any k = o(log σ u). This space is about four times the compressed text size. It can locate all the occ occurrences of a pattern P in T in O(m 3 log σ+(m+occ) log u) worstcase time. Despite this index has shown to be very competitive in practice, the O(m 3 log σ) term can be excessive for long patterns. Also, the factor 4 in its space complexity makes it larger than other stateoftheart alternatives. In this paper we present stronger LempelZiv based indices, improving the overall performance of the LZindex. We achieve indices requiring (2+ǫ)uHk(T)+o(u log σ) bits of space, for any constant ǫ> 0, which makes our indices the smallest existing LZindices. We simultaneously improve the search time to
Indexed Hierarchical Approximate String Matching
"... Abstract. We present a new search procedure for approximate string matching over suffix trees. We show that hierarchical verification, which is a wellestablished technique for online searching, can also be used with an indexed approach. For this, we need that the index supports bidirectionality, m ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We present a new search procedure for approximate string matching over suffix trees. We show that hierarchical verification, which is a wellestablished technique for online searching, can also be used with an indexed approach. For this, we need that the index supports bidirectionality, meaning that the search for a pattern can be updated by adding a letter at the right or at the left. This turns out to be easily supported by most compressed text selfindexes, which represent the index and the text essentially in the same space of the compressed text alone. To complete the symbiotic exchange, our hierarchical verification largely reduces the need to access the text, which is expensive in compressed text selfindexes. The resulting algorithm can, in particular, run over an existing fully compressed suffix tree, which makes it very appealing for applications in computational biology. We compare our algorithm with related approaches, showing that our method offers an interesting space/time tradeoff, and in particular does not need of any parameterization,