Results 1 -
5 of
5
Fast compression with a static model in high-order entropy
- In Proceedings of the IEEE Data Compression Conference, Snowbird, UT
, 2004
"... We report on a simple encoding format called wzip for decompressing blocksorting transforms, such as the Burrows-Wheeler Transform (BWT). Our compressor uses the simple notions of gamma encoding and RLE, organized with a wavelet tree, to achieve a slightly better compression ratio than bzip2 in less ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We report on a simple encoding format called wzip for decompressing blocksorting transforms, such as the Burrows-Wheeler Transform (BWT). Our compressor uses the simple notions of gamma encoding and RLE, organized with a wavelet tree, to achieve a slightly better compression ratio than bzip2 in less time. In fact, our compression/decompression time is dependent on Hh, the hth order empirical entropy. This relationship of performance to the compressibility of data is a key new idea among compression algorithms. Another key contribution of our compressor is its simplicity. Our compressor can also operate as a full-text index with a small amount of data, while still preserving backward compatibility with just the compressor. 1
Symbol ranking text compression with Shannon recoding
- J. UCS
, 1997
"... Abstract In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method add ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one third most likely in the present context’. While some other recent compressors can be explained in terms of symbol ranking, few make explicit reference to the concept. This report describes an implementation of Shannon’s method and shows that it forms the basis of a good text compressor.
An algorithmic framework for compression and text indexing
"... We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lower-order terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hth-order empirical entropy of the text, Hh. In ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a unified algorithmic framework to obtain nearly optimal space bounds for text compression and compressed text indexing, apart from lower-order terms. For a text T of n symbols drawn from an alphabet Σ, our bounds are stated in terms of the hth-order empirical entropy of the text, Hh. In particular, we provide a tight analysis of the Burrows-Wheeler transform (bwt) establishing a bound of nHh + M(T,Σ,h) bits, where M(T,Σ,h) denotes the asymptotical number of bits required to store the empirical statistical model for contexts of order h appearing in T. Using the same framework, we also obtain an implementation of the compressed suffix array (csa) which achieves nHh + M(T,Σ,h) + O(nlg lg n/lg |Σ | n) bits of space while still retaining competitive full-text indexing functionality. The novelty of the proposed framework lies in its use of the finite set model instead of the empirical probability model (as in previous work), giving us new insight into the design and analysis of our algorithms. For example, we show that our analysis gives improved bounds since M(T,Σ,h) ≤ min{g ′ h lg(n/g ′ h + 1),H ∗ hn + lg n + g′′ h}, where g ′ h = O(|Σ|h+1) and g ′′ h = O(|Σ | h+1 lg |Σ | h+1) do not depend on the text length n, while H ∗ h ≥ Hh is the modified hthorder empirical entropy of T. Moreover, we show a strong relationship between a compressed full-text index and the succinct dictionary problem. We also examine the importance of lowerorder terms, as these can dwarf any savings achieved by high-order entropy. We report further results and tradeoffs on high-order entropy-compressed text indexes in the paper. 1
Clouseau: Probabilistic Dynamic Verification of Multithreaded Memory Systems
, 2004
"... Dynamic verification enables a system to improve its availability by checking that its execution is correct as it is running. While high performance and low power are desirable, correctness— despite hardware faults and subtle design bugs—is most important. For multithreaded systems, memory system co ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Dynamic verification enables a system to improve its availability by checking that its execution is correct as it is running. While high performance and low power are desirable, correctness— despite hardware faults and subtle design bugs—is most important. For multithreaded systems, memory system correctness is defined by the memory consistency model. Thus, dynamically verifying memory consistency would ensure that the entire memory system is operating correctly. We present the first implementable design for probabilistic dynamic verification of sequential consistency (pDVSC) in multithreaded systems. The system dynamically creates a total order of memory operations (loads and stores) and verifies that this total order obeys SC. In the theoretical world of systems without resource constraints, DVSC would have to consider the entire total order, but we show how to leverage resource constraints to verify only a sliding window of the total order. While we cannot bound the size of this window and still eliminate all false verifications (false positives or negatives), we can implement probabilistic verification and make the probability of false verification arbitrarily small. We use full-system simulation of a multithreaded system running commercial workloads to evaluate our first implementation of pDVSC, called Clouseau. Clouseau’s implementation costs are kept reasonable via extensive compression and caching of the data that is used for dynamic verification. Clouseau, combined with backward error recovery, improves availability by recovering from injected errors. Clouseau adds only negligible performance overhead. While Clouseau adds to system design complexity, we believe this is a small price to pay for improving system availability. 1
Working with Compressed Concordances
"... Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the acc ..."
Abstract
- Add to MetaCart
Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the accessed data, saving I/O operations and CPU time.

