Results 1 - 10
of
25
Data Compression
- ACM Computing Surveys
, 1987
"... This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effectiv ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
Design and analysis of dynamic Huffman codes
- Journal of the ACM
, 1987
"... Abstract. A new one-pass algorithm for constructing dynamic Huffman codes is introduced and analyzed. We also analyze the one-pass algorithm due to Failer, Gallager, and Knuth. In each algorithm, both the sender and the receiver maintain equivalent dynamically varying Huffman trees, and the coding i ..."
Abstract
-
Cited by 76 (3 self)
- Add to MetaCart
Abstract. A new one-pass algorithm for constructing dynamic Huffman codes is introduced and analyzed. We also analyze the one-pass algorithm due to Failer, Gallager, and Knuth. In each algorithm, both the sender and the receiver maintain equivalent dynamically varying Huffman trees, and the coding is done in real time. We show that the number of bits used by the new algorithm to encode a message containing t letters is < t bits more than that used by the conventional two-pass Huffman scheme, independent of the alphabet size. This is best possible in the worst case, for any one-pass Huffman method. Tight upper and lower bounds are derived. Empirical tests show that the encodings produced by the new algorithm are shorter than those of the other one-pass algorithm and, except for long messages, are shorter than those of the two-pass method. The new algorithm is well suited for on-line encoding/decoding in data networks and for file compression.
Interval and Recency Rank Source Coding: Two On-Line Adaptive Variable-Length Schemes
, 1987
"... In the schemes presented the encoder maps each message into a codeword in a prefix-free codeword set. In interval encoding the codeword is indexed by the interval since the last previous occurrence of that message, and the codeword set must be countably infinite. In recency rank encoding the codewor ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In the schemes presented the encoder maps each message into a codeword in a prefix-free codeword set. In interval encoding the codeword is indexed by the interval since the last previous occurrence of that message, and the codeword set must be countably infinite. In recency rank encoding the codeword is indexed by the number of distinct messages in that interval, and there must be no fewer codewords than messages. The decoder decodes each codeword on receipt. Users need not know message probabilities, but must agree on indexings, of the codeword set in an order of increasing length and of the message set in some arbitrary order. The average codeword length over a communications bout is never much larger than the value for an off-line scheme which maps the jth most frequent message in the bout into the jth shortest codeword in the given set, and is never too much larger than the value for off-line Huffman encoding of messages into the best codeword set for the bout message frequencies.
Practical Implementations of Arithmetic Coding
- IN IMAGE AND TEXT
, 1992
"... We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, space-efficient, approximate arithmet ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, space-efficient, approximate arithmetic coder with only minimal loss of compression efficiency. Our coder is based on the replacement of arithmetic by table lookups coupled with a new deterministic probability estimation scheme.
Fast and efficient lossless image compression
- in Proc. 1993 Data Compression Conference, (Snowbird)
, 1993
"... We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted binary codes, and Golomb or Rice codes. For the latter we present and analyze a provably good method for estimating the single coding parameter.
Document Image Compression and Analysis
- PhD of the university of Maryland
, 1997
"... Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this thesis we explore an approach to document image compression that is efficient in ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this thesis we explore an approach to document image compression that is efficient in both space (storage requirement) and time (processing flexibility). A representation is presented in which component-level redundancy is removed by forming a prototype library and component location table. This representation forms a basis for compression and provides direct access to image components. To generate the prototype library, a new clustering approach is developed which is suitable for document image components. The distance metric is based on a character degradation model so that degraded versions of the same character will be grouped together. To achieve a lossless representation when required, the residuals are encoded efficiently using a structural distance ordering. OCR is...
Splay Trees for Data Compression
, 1995
"... We present applications of splay trees to two topics in data compression. First is a variant of the move-to-front (mtf) data compression (of Bentley,Sleator Tarjan and Wei) algorithm, where we introduce secondary list(s). This seems to capture higher-order correlations. An implementation of this alg ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present applications of splay trees to two topics in data compression. First is a variant of the move-to-front (mtf) data compression (of Bentley,Sleator Tarjan and Wei) algorithm, where we introduce secondary list(s). This seems to capture higher-order correlations. An implementation of this algorithm with Sleator-Tarjan splay trees runs in time (provably) proportional to the entropy of the input sequence. When tested on some telephony data, compression ratio and run time showed significant improvements over original mtf-algorithm, making it competitive or better than popular programs. For stationary ergodic sources, we analyse the compression and output distribution of the original mtf-algorithm, which suggests why the secondary list is appropriate to introduce. We also derive analytical upper bounds on the average codeword length in terms of stochastic parameters of the source. Secondly, we consider the compression (or coding) of source sequences where the codewords are required ...
Dynamic Shannon Coding
, 2005
"... We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-r ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-restricted coding, alphabetic coding and coding with unequal letter costs.
XComp: An XML compression tool
, 2003
"... I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other instituions or individuals for the purpose of scholarly research. Signature of Author I further authorize the University of Waterloo to reproduce this thesis by photocopying ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other instituions or individuals for the purpose of scholarly research. Signature of Author I further authorize the University of Waterloo to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions of individuals for the purpose of scholarly research. ii Signature of Author The University of Waterloo requires the signatures of all persons using or photocopying this thesis. Please sign below, and give address and date. iii Table of Contents Table of Contents iv Abstract vi
Dynamic Length-Restricted Coding
, 2003
"... Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character in $S$

