Results 1 
7 of
7
Breaking a TimeandSpace Barrier in Constructing FullText Indices
"... Suffix trees and suffix arrays are the most prominent fulltext indices, and their construction algorithms are well studied. It has been open for a long time whether these indicescan be constructed in both o(n log n) time and o(n log n)bit working space, where n denotes the length of the text. Int ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
Suffix trees and suffix arrays are the most prominent fulltext indices, and their construction algorithms are well studied. It has been open for a long time whether these indicescan be constructed in both o(n log n) time and o(n log n)bit working space, where n denotes the length of the text. Inthe literature, the fastest algorithm runs in O(n) time, whileit requires O(n log n)bit working space. On the other hand,the most spaceefficient algorithm requires O(n)bit working space while it runs in O(n log n) time. This paper breaks the longstanding timeandspace barrier under the unitcost word RAM. We give an algorithm for constructing the suffix array which takes O(n) time and O(n)bit working space, for texts with constantsize alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O(n logffl n) time and O(n)bit working space forany 0! ffl! 1. Apart from that, our algorithm can alsobe adopted to build other existing fulltext indices, such as
Fast lightweight suffix array construction and checking
 14th Annual Symposium on Combinatorial Pattern Matching
, 2003
"... We describe an algorithm that, for any v 2 [2; n], constructs the suffix array of a string of length n in O(vn + n log n) time using O(v + n= p v) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an O(n log n) time algorithm using O n= p ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
We describe an algorithm that, for any v 2 [2; n], constructs the suffix array of a string of length n in O(vn + n log n) time using O(v + n= p v) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an O(n log n) time algorithm using O n= p
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays ∗
"... With the first human DNA being decoded into a sequence of about 2.8 billion characters, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located effici ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
With the first human DNA being decoded into a sequence of about 2.8 billion characters, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits for indexing a text of n characters. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese or Japanese. Precisely, when the alphabet size is Σ, the working space becomes O(n(H0 + 1)) bits, where H0 denotes the order0 entropy of the text and it is at most log Σ; for the time complexity, it remains O(n log n) which is independent of Σ. 1
Compression of a Dictionary
, 2006
"... Some text compression methods take advantage from using more complex compression units than characters. The synchronization between coder and decoder then can be done by transferring the unit dictionary together with the compressed message. We propose to use a dictionary compression method based on ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Some text compression methods take advantage from using more complex compression units than characters. The synchronization between coder and decoder then can be done by transferring the unit dictionary together with the compressed message. We propose to use a dictionary compression method based on a proper ordering of nodes of the treeorganized dictionary. This reordering allows achieving of better compression ratio. The proposed dictionary compression method has been tested to compress dictionaries for word and syllablebased compression methods. It seems to be effective for compressing dictionaries of syllables, and promising for larger dictionaries of words.
Compressed Text Indexing and Range Searching
, 2006
"... We introduce two transformations Text2Points and Points2Text that, respectively, convert text to points in space and viceversa. With these transformations, data structural problems in pattern matching and geometric range searching can be linked. We show strong connections between space versus query ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We introduce two transformations Text2Points and Points2Text that, respectively, convert text to points in space and viceversa. With these transformations, data structural problems in pattern matching and geometric range searching can be linked. We show strong connections between space versus query time tradeoffs in these fields. Thus, the results in range searching can be applied to compressed indexing and vice versa. In particular, we show that for a given equivalent space, pattern matching queries can be done using 2D range searching and viceversa with query times within a factor of O(1ogn) of each other. This twoway connection enables us not only to design new data structures for compressed text indexing, but also to derive new lower bounds. For compressed text indexing, we propose alternative data structures based on our Text2Points transform and Csided orthogonal query structures in 2D. Currently, all proposed compressed text indexes are based on the BurrowsWheeler transform (BWT) or its inverse [16,17,20,22,42]. We observe that our Text2Points transform is related to BWT on blocked text, and hence we also call it geometric BWT. With this variant, we solve some wellknown open problems in this area of compressed text indexing. In particular, we present the first external memory results for compressed text indexing. We give the first compressed data structures for positionrestricted pattern matching [27,34]. We also show lower bounds for these problems and for the problem of text indexing in general. These are the first known lower bounds (hardness results) in this area.
LongestCommonPrefix Computation in BurrowsWheeler Transformed Text
"... Abstract. In this paper we consider the existing algorithm for computation of the LongestCommonPrefix (LCP) array given a text string and its suffix array and adapt it to work on BurrowsWheeler Transform (BWT) text. We did this by a combination of preprocessing steps and improvement based on exi ..."
Abstract
 Add to MetaCart
Abstract. In this paper we consider the existing algorithm for computation of the LongestCommonPrefix (LCP) array given a text string and its suffix array and adapt it to work on BurrowsWheeler Transform (BWT) text. We did this by a combination of preprocessing steps and improvement based on existing algorithm. Three LCP array computation algorithms were proposed, namely LCPBA, LCPBB and LCPBC that need only BWT text as an input. LCPBA was a simple adaptation from the existing algorithm which did a preprocessing on BWT text of length n to generate suffix array and its original text. Then, output of this step was fed to existing algorithm which does LCP array computation in O(n) time. LCPBB reduces LCPBA preprocessing time while requiring additional 4n space as it substitutes original text with two auxiliary arrays in LCP array computation. LCPBC takes a step forward from LCPBA and LCPBB by utilizing the advantage of BWT text structure. It effectively reduces preprocessing time and memory requirements for a price of θ(δn) time when calculating LCP array, where δ is the average LCP. Experimental results showed that in terms of speed, LCPA practically performs as well as the original algorithm, compared to LCPBB and LCPBC. However, LCPBC consumes less space than the other algorithms.
Compression Gains in 2D MRCP Biliary Tree Modeling
, 2004
"... MRCP images are used to analyze the biliary tract in the diagnosis of liver diseases. As with many medical imaging techniques, MRCP are taken as a large series of images, usually including an amount of surrounding tissue, organs and acquisition noise. To minimize the storage requirements of patient ..."
Abstract
 Add to MetaCart
MRCP images are used to analyze the biliary tract in the diagnosis of liver diseases. As with many medical imaging techniques, MRCP are taken as a large series of images, usually including an amount of surrounding tissue, organs and acquisition noise. To minimize the storage requirements of patient medical history archives, and provide some assistance in reducing the influence of distracting noise in MRCP images diagnosis, the ROI may be modeled. Compression is derived from the lesser size of the models, thus saving on limited storage resources. This paper presents the compression gains through a technique for modeling of the hierarchical biliary tree structure in 2D MRCP images.