Results 1 
4 of
4
Searching BWT compressed text with the BoyerMoore algorithm and binary search
 Proceedings, IEEE Data Compression Conference, 2002
, 2002
"... Abstract: This paper explores two techniques for online exact pattern matching in files that have been compressed using the BurrowsWheeler transform. We investigate two approaches. The first is an application of the BoyerMoore algorithm (Boyer & Moore 1977) to a transformed string. The second app ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Abstract: This paper explores two techniques for online exact pattern matching in files that have been compressed using the BurrowsWheeler transform. We investigate two approaches. The first is an application of the BoyerMoore algorithm (Boyer & Moore 1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompressandsearch approach for small numbers of queries, and binary search is much faster even for large numbers of queries. 1
The SCP and compressed domain analysis of biological sequences
 Proc., IEEE Bioinformatics Conference
, 2003
"... We introduce the SCP the sorted common prefix, and study some of its properties. Based on the internal representations used by a class of new compression schemes, we show how the SCP table can be constructed using an O ( u + Σκmax) number of comparisons on average, and O ( u Σ) worst case, where u ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We introduce the SCP the sorted common prefix, and study some of its properties. Based on the internal representations used by a class of new compression schemes, we show how the SCP table can be constructed using an O ( u + Σκmax) number of comparisons on average, and O ( u Σ) worst case, where u is the size of the sequence, Σ is the number of symbols, and κ max is the maximum SCP value. We describe how two applications of the SCP in biological sequence analysis. In particular, using the SCP, and the compressed representation of the sequence, we present an algorithm for finding all the η occ canonical tandem arrays in the sequence in O ( u + ηocc + Σκmax) time on average, and O ( η occ + u Σ) worst case. Preliminary results on the statistics of the SCP for some DNA and protein sequences are included. 1.
LZW Based Compressed Pattern Matching
"... Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The reported work is based on Amir’s wellknown “almostoptimal ” algorithm [1] but has been improved to search not only the first occurrence of the pattern but also all other occurrences. The improvements also include the multipattern matching and a faster implementation for socalled “simple pattern”, which is defined as “a pattern with no symbol appearing more than once”. Extensive experiments have been conducted to test the search performance and to compare with not only the “decompressthensearch” approach but also the best available compressed pattern matching algorithms, particularly the BWTbased algorithms [2, 3]. The results showed that our method is competitive among the best algorithms.
COMPRESSED PATTERN MATCHING FOR TEXT AND IMAGES
, 2005
"... The amount of information that we are dealing with today is being generated at an everincreasing rate. On one hand, data compression is needed to efficiently store, organize the data and transport the data over the limitedbandwidth network. On the other hand, efficient information retrieval is need ..."
Abstract
 Add to MetaCart
The amount of information that we are dealing with today is being generated at an everincreasing rate. On one hand, data compression is needed to efficiently store, organize the data and transport the data over the limitedbandwidth network. On the other hand, efficient information retrieval is needed to speedily find the relevant information from this huge mass of data using available resources. The compressed pattern matching problem can be stated as: given the compressed format of a text or an image and a pattern string or a pattern image, report the occurrence(s) of the pattern in the text or image with minimal (or no) decompression. The main advantages of compressed pattern matching versus the naïve decompressthensearch approach are: First, reduced storage cost. Since there is no need to decompress the data or there is only minimal decompression required, the disk space and the memory cost is reduced. Second, less search time. Since the size of the compressed data is smaller than that of the original data, a searching performed on the compressed data will result in a shorter search time. The challenge of efficient compressed pattern matching can be met from two inseparable