Results 1  10
of
11
Searching BWT compressed text with the BoyerMoore algorithm and binary search
 Proceedings, IEEE Data Compression Conference, 2002
, 2002
"... Abstract: This paper explores two techniques for online exact pattern matching in files that have been compressed using the BurrowsWheeler transform. We investigate two approaches. The first is an application of the BoyerMoore algorithm (Boyer & Moore 1977) to a transformed string. The second ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
Abstract: This paper explores two techniques for online exact pattern matching in files that have been compressed using the BurrowsWheeler transform. We investigate two approaches. The first is an application of the BoyerMoore algorithm (Boyer & Moore 1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompressandsearch approach for small numbers of queries, and binary search is much faster even for large numbers of queries. 1
The SCP and compressed domain analysis of biological sequences
 Proc., IEEE Bioinformatics Conference
, 2003
"... We introduce the SCP the sorted common prefix, and study some of its properties. Based on the internal representations used by a class of new compression schemes, we show how the SCP table can be constructed using an O ( u + Σκmax) number of comparisons on average, and O ( u Σ) worst case, where u ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We introduce the SCP the sorted common prefix, and study some of its properties. Based on the internal representations used by a class of new compression schemes, we show how the SCP table can be constructed using an O ( u + Σκmax) number of comparisons on average, and O ( u Σ) worst case, where u is the size of the sequence, Σ is the number of symbols, and κ max is the maximum SCP value. We describe how two applications of the SCP in biological sequence analysis. In particular, using the SCP, and the compressed representation of the sequence, we present an algorithm for finding all the η occ canonical tandem arrays in the sequence in O ( u + ηocc + Σκmax) time on average, and O ( η occ + u Σ) worst case. Preliminary results on the statistics of the SCP for some DNA and protein sequences are included. 1.
LZW Based Compressed Pattern Matching
"... Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The reported work is based on Amir’s wellknown “almostoptimal ” algorithm [1] but has been improved to search not only the first occurrence of the pattern but also all other occurrences. The improvements also include the multipattern matching and a faster implementation for socalled “simple pattern”, which is defined as “a pattern with no symbol appearing more than once”. Extensive experiments have been conducted to test the search performance and to compare with not only the “decompressthensearch” approach but also the best available compressed pattern matching algorithms, particularly the BWTbased algorithms [2, 3]. The results showed that our method is competitive among the best algorithms.
Approximate pattern match using the BurrowsWheeler transform
 Proceedings of Data Compression Conference
, 2003
"... Abstract. The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T, with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on BWT compressed text. The BWT provides a ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T, with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on BWT compressed text. The BWT provides a lexicographic ordering of the input text as part of its inverse transformation process. Based on this observation, pattern matching is performed by text prefiltering, using a fast qgram intersection of segments from the pattern P and the text T. Algorithms are proposed that solve the kmismatch problem in O(min{m(m − k)Σk log uΣ ,mu log uΣ}) time worst case, and the kapproximate matching problem in O(Σ  log Σ+ m2 k log uΣ  + αk) time on average (α ≤ u), where u = T  is the size of the text, m = P  is the size of the pattern, and Σ is the symbol alphabet. Each algorithm requires an O(u) auxiliary arrays, which are constructed in O(u) time and space. 1
Pattern Matching in LZW Compressed Files
 IEEE Transactions on Computers
, 2005
"... ..."
(Show Context)
COMPRESSED PATTERN MATCHING FOR TEXT AND IMAGES
, 2005
"... The amount of information that we are dealing with today is being generated at an everincreasing rate. On one hand, data compression is needed to efficiently store, organize the data and transport the data over the limitedbandwidth network. On the other hand, efficient information retrieval is need ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The amount of information that we are dealing with today is being generated at an everincreasing rate. On one hand, data compression is needed to efficiently store, organize the data and transport the data over the limitedbandwidth network. On the other hand, efficient information retrieval is needed to speedily find the relevant information from this huge mass of data using available resources. The compressed pattern matching problem can be stated as: given the compressed format of a text or an image and a pattern string or a pattern image, report the occurrence(s) of the pattern in the text or image with minimal (or no) decompression. The main advantages of compressed pattern matching versus the naïve decompressthensearch approach are: First, reduced storage cost. Since there is no need to decompress the data or there is only minimal decompression required, the disk space and the memory cost is reduced. Second, less search time. Since the size of the compressed data is smaller than that of the original data, a searching performed on the compressed data will result in a shorter search time. The challenge of efficient compressed pattern matching can be met from two inseparable
Compressed pattern matching for predictive lossless image encoding
 Proceeding of Distributed Multimedia Systems
, 2003
"... Pattern matching in compressed image domain is a new topic in computer science. Many works have been reported for pattern matching for compressed text and for lossy compressed image. However, searching of images in lossless compressed domain is almost a blank area and needs to be explored. Lossless ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Pattern matching in compressed image domain is a new topic in computer science. Many works have been reported for pattern matching for compressed text and for lossy compressed image. However, searching of images in lossless compressed domain is almost a blank area and needs to be explored. Lossless image compression is widely used in areas such as medical images, satellite images, geometric images and many other areas that need to losslessly maintain the data of the images. Being able to searching in the compressed domain will save disk space and searching time and bring up considerable economic savings in these areas. In our work, we have studied the possibility of compressed pattern matching for the most three popular lossless image compression schemes: lossless JPEG, CALIC and JPEGLS. Our study indicates that these algorithms can be searchaware by minor modification. We also present a modified JPEGLS algorithm and the corresponding searching algorithm. Experimental results show that our method, comparing with the “decompressthensearching ” method, has nearly 30% improvement in searching time for most natural images. The modified JPEGLS algorithm also has shorter encoding and decoding time, with an improvement of about 1215 % and 812%, respectively, for most natural images. The tradeoff is the decrease of compression of about 2 %8%. To our best knowledge, this is the first report on JPEGLS compressed matching algorithm and this is the first “competitive” compressed pattern matching algorithm for lossless image compression.
MultiplePattern Matching In LZW Compressed Files Using AhoCorasick Algorithm
"... Compressed pattern matching is an emerging research area that aims in searching patterns efficiently in the compressed files with minimal (or no) decompression. In this paper, we report our work on multiplepattern matching in LZW compressed files using AhoCorasick algorithm. The algorithm takes O( ..."
Abstract
 Add to MetaCart
(Show Context)
Compressed pattern matching is an emerging research area that aims in searching patterns efficiently in the compressed files with minimal (or no) decompression. In this paper, we report our work on multiplepattern matching in LZW compressed files using AhoCorasick algorithm. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the size of the pattern length, t is the size of the LZW trie and r is the number of occurrences of the patterns. Extensive experiments have been conducted to test the performance of our algorithms. The results showed that our multiplepattern matching algorithm is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archival search if the archives are compressed with a common LZW trie. 1
JPEGLS Based TwoDimensional Compressed Pattern Matching
"... With the phenomenal advances in data acquisition techniques via satellites and in medical diagnostics and forensic sciences, we have encountered a massive growth of image data. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much ..."
Abstract
 Add to MetaCart
(Show Context)
With the phenomenal advances in data acquisition techniques via satellites and in medical diagnostics and forensic sciences, we have encountered a massive growth of image data. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible, even when it is being searched. The class of images we are concerned in this paper are compressed losslessly due to application constraints. In this paper, we present new algorithms for twodimensional compressed pattern matching where the images are compressed based on JPEGLS variations. To the best of our knowledge, our work is the first reported work that is based on JPEGLS in the field. We present a globalcontext searchaware variation of the JPEGLS algorithm and the corresponding searching algorithm, which requires partial decompression of the compressed data. We also present a new twopass variation of the JPEGLS algorithm and the corresponding searching algorithm that achieves searchawareness through a common compression technique called the semistatic dictionary. The searching algorithm based on the twopass variation requires no decompression at all and therefore works in the fully compressed domain. It runs in time O(nc+mc+nm+m2) with extra space O(n+m+mc), where n is the number of columns of the image, m is the number of rows and columns of the pattern, nc is the compressed image size and mc is the compressed pattern size. The algorithm is the first known twodimensional CPM algorithm that works in the fully
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLECLICK HERE TO EDIT) <
"... Abstract—Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern mat ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The reported work is based on Amir’s wellknown “almostoptimal ” algorithm but has been improved to search not only the first occurrence of the pattern but also all other occurrences. The improvements also include the multipattern matching and a faster implementation for socalled “simple patterns”. Extensive experiments have been conducted to test the search performance and to compare with the BWTbased compressed pattern matching algorithms. The results showed that our method is competitive among the best compressed pattern matching algorithms. LZW is one of the most efficient and popular compression algorithms used extensively and our method requires no modification on the compression algorithm. The work reported in this paper, therefore, has great economical and market potential.