Results 1 
8 of
8
Fast compression with a static model in highorder entropy
 In Proceedings of the IEEE Data Compression Conference, Snowbird, UT
, 2004
"... We report on a simple encoding format called wzip for decompressing blocksorting transforms, such as the BurrowsWheeler Transform (BWT). Our compressor uses the simple notions of gamma encoding and RLE, organized with a wavelet tree, to achieve a slightly better compression ratio than bzip2 in less ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We report on a simple encoding format called wzip for decompressing blocksorting transforms, such as the BurrowsWheeler Transform (BWT). Our compressor uses the simple notions of gamma encoding and RLE, organized with a wavelet tree, to achieve a slightly better compression ratio than bzip2 in less time. In fact, our compression/decompression time is dependent on Hh, the hth order empirical entropy. This relationship of performance to the compressibility of data is a key new idea among compression algorithms. Another key contribution of our compressor is its simplicity. Our compressor can also operate as a fulltext index with a small amount of data, while still preserving backward compatibility with just the compressor. 1
An algorithmic framework for compression and text indexing
"... We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lowerorder terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hthorder empirical entropy of the text, Hh. In ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lowerorder terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hthorder empirical entropy of the text, Hh. In particular, we provide a tight analysis of the BurrowsWheeler transform (bwt) establishing a bound of nHh + M(T,Σ,h) bits, where M(T,Σ,h) denotes the asymptotical number of bits required to store the empirical statistical model for contexts of order h appearing in T. Using the same framework, we also obtain an implementation of the compressed suffix array (csa) which achieves nHh + M(T,Σ,h) + O(nlg lg n/lg Σ  n) bits of space while still retaining competitive fulltext indexing functionality. The novelty of the proposed framework lies in its use of the finite set model instead of the empirical probability model (as in previous work), giving us new insight into the design and analysis of our algorithms. For example, we show that our analysis gives improved bounds since M(T,Σ,h) ≤ min{g ′ h lg(n/g ′ h + 1),H ∗ hn + lg n + g′′ h}, where g ′ h = O(Σh+1) and g ′′ h = O(Σ  h+1 lg Σ  h+1) do not depend on the text length n, while H ∗ h ≥ Hh is the modified hthorder empirical entropy of T. Moreover, we show a strong relationship between a compressed fulltext index and the succinct dictionary problem. We also examine the importance of lowerorder terms, as these can dwarf any savings achieved by highorder entropy. We report further results and tradeoffs on highorder entropycompressed text indexes in the paper. 1
Symbol ranking text compression with Shannon recoding
 J. UCS
, 1997
"... Abstract In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method add ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one third most likely in the present context’. While some other recent compressors can be explained in terms of symbol ranking, few make explicit reference to the concept. This report describes an implementation of Shannon’s method and shows that it forms the basis of a good text compressor.
Clouseau: Probabilistic Dynamic Verification of Multithreaded Memory Systems
, 2004
"... Dynamic verification enables a system to improve its availability by checking that its execution is correct as it is running. While high performance and low power are desirable, correctness— despite hardware faults and subtle design bugs—is most important. For multithreaded systems, memory system co ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Dynamic verification enables a system to improve its availability by checking that its execution is correct as it is running. While high performance and low power are desirable, correctness— despite hardware faults and subtle design bugs—is most important. For multithreaded systems, memory system correctness is defined by the memory consistency model. Thus, dynamically verifying memory consistency would ensure that the entire memory system is operating correctly. We present the first implementable design for probabilistic dynamic verification of sequential consistency (pDVSC) in multithreaded systems. The system dynamically creates a total order of memory operations (loads and stores) and verifies that this total order obeys SC. In the theoretical world of systems without resource constraints, DVSC would have to consider the entire total order, but we show how to leverage resource constraints to verify only a sliding window of the total order. While we cannot bound the size of this window and still eliminate all false verifications (false positives or negatives), we can implement probabilistic verification and make the probability of false verification arbitrarily small. We use fullsystem simulation of a multithreaded system running commercial workloads to evaluate our first implementation of pDVSC, called Clouseau. Clouseau’s implementation costs are kept reasonable via extensive compression and caching of the data that is used for dynamic verification. Clouseau, combined with backward error recovery, improves availability by recovering from injected errors. Clouseau adds only negligible performance overhead. While Clouseau adds to system design complexity, we believe this is a small price to pay for improving system availability. 1
Working with Compressed Concordances
"... Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the acc ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the accessed data, saving I/O operations and CPU time.
other popular file sharing services, compressed
, 2002
"... We explore the performance of the Discrete Wavelet Transform (DWT) as applied to the lossless compression of sampled waveform data. Specifically, we have developed a parameterizable algorithm that uses the DWT to losslessly compress RIFF WAVE audio files. Our algorithm uses the DWT as an approximati ..."
Abstract
 Add to MetaCart
(Show Context)
We explore the performance of the Discrete Wavelet Transform (DWT) as applied to the lossless compression of sampled waveform data. Specifically, we have developed a parameterizable algorithm that uses the DWT to losslessly compress RIFF WAVE audio files. Our algorithm uses the DWT as an approximation to our original waveform, and we use a variety of entropy coders to store the difference between the original waveform and our approximation. We explore the performance of our algorithm under a number of different parameters, including different types of music, entropy encoders, and wavelet bases. Despite our optimizations, we find that our algorithm achieves compression that is inferior to existing lossless codecs. This leads us to conclude that wavelets are not the most appropriate models for complex sound data.
Systems] Storage Management Main Memory, E.1 [Data Structures] Arrays, E.1 [Data Structures] Tables Page 1 of 40 Journal of the Associateion for Computing Machinery
"... We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lowerorder terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hthorder empirical For Peer Review entropy of t ..."
Abstract
 Add to MetaCart
We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lowerorder terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hthorder empirical For Peer Review entropy of the text, Hh. In particular, we provide a tight analysis of the BurrowsWheeler transform (bwt) establishing a bound of nHh + M(T,Σ,h) bits, where M(T,Σ,h) denotes the asymptotical number of bits required to store the empirical statistical model for contexts of order h appearing in T. Using the same framework, we also obtain an implementation of the compressed suffix array (csa) which achieves nHh + M(T,Σ,h) + O(nlg lg n/lg Σ  n) bits of space while still retaining competitive fulltext indexing functionality. The novelty of the proposed framework lies in its use of the finite set model instead of the empirical probability model (as in previous work), giving us new insight into the design and analysis of our algorithms. For example, we show that our analysis gives improved bounds since M(T,Σ,h) ≤ min{g ′ h lg(n/g ′ h + 1),H ∗ h