Results 1  10
of
13
Adding Compression to a FullText Retrieval System
, 1995
"... We describe the implementation of a data compression scheme as an integral and transparent layer within a fulltext... ..."
Abstract

Cited by 81 (25 self)
 Add to MetaCart
We describe the implementation of a data compression scheme as an integral and transparent layer within a fulltext...
A fast and spaceeconomical algorithm for lengthlimited coding
 Proc. Int. Symp. Algorithms and Computation, pp.1221
, 1995
"... Abstract. The minimumredundancy prefix code problem is to determine a list of integer codeword lengths I = [li l i E {1... n}], given a list of n symbol weightsp = [pili C {1.n}], such that ~' ~ 2l ' < 1, 9 " i = ln and ~i=1 lipi is minimised. An extension is the minimumredundancy lengthl ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. The minimumredundancy prefix code problem is to determine a list of integer codeword lengths I = [li l i E {1... n}], given a list of n symbol weightsp = [pili C {1.n}], such that ~' ~ 2l ' < 1, 9 " i = ln and ~i=1 lipi is minimised. An extension is the minimumredundancy lengthlimited prefix code problem, in which the further constraint li < L is imposed, for all i C {1...n} and some integer L> [log 2 hi. The packagemerge algorithm of Larmore and Hirschberg generates lengthlimited codes in O(nL) time using O(n) words of auxiliary space. Here we show how the size of the work space can be reduced to O(L2). This represents a useful improvement, since for practical purposes L is O(log n). 1
Skeleton Trees for the Efficient Decoding of Huffman Encoded Texts
 Information Retrieval
, 1997
"... : A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log 2 n) for trees of depth O(log n), and decoding is faster, because a part of the bitcomparisons nec ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
: A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log 2 n) for trees of depth O(log n), and decoding is faster, because a part of the bitcomparisons necessary for the decoding may be saved. Empirical results on large reallife distributions show a reduction of up to 50% and more in the number of bit operations. The basic idea is then generalized, yielding further savings. This is an extended version of a paper which has been presented at the 8th Annual Symposium on Combinatorial Pattern Matching (CPM'97), and appeared in its proceedings, pp. 6575.  1  1.
Lossless Compression for Text and Images
 International Journal of High Speed Electronics and Systems
, 1995
"... Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as imagesparticularly bilevel ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as imagesparticularly bilevel ones, or ones arising in medical and remotesensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described. Keywords Text compression, ima...
Twenty (or so) questions: Dary boundedlength Huffman coding,” preprint available from http://arxiv.org/abs/cs.IT/0602085
"... Abstract — The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed that mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs fr ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract — The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed that mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs from Twenty Questions in two ways: Answers need not be only “yes ” and “no, ” and the device continues to ask questions beyond the traditional twenty; typically, at least 20 and at most 25 questions are asked. The nonbinary variation on source coding is one that is well known and understood, but not with such bounds on length. An upper bound on the related property of fringe, the difference between the lengths of the longest and the shortest codewords, has been considered, but no polynomialtime algorithm currently finds optimal fringelimited codes. An O(n(lmax − lmin))time O(n)space PackageMergebased algorithm is presented here for finding an optimal Dary (binary or nonbinary) source code with all n codeword lengths (numbers of questions) bounded to be within the interval [lmin, lmax]. This algorithm minimizes average codeword length or, more generally, any other quasiarithmetic convex coding penalty. In the case of minimizing average codeword length, time complexity can often be improved via an alternative graphbased reduction. This has, as a special case, a method for nonbinary lengthlimited Huffman coding, which was previously solved via dynamic programming with O(n 2 lmax log D) time and O(n 2 log D) space. These algorithms can also be used to efficiently find a code that is optimal given a limit on fringe. I.
On the Cost of WorstCase Coding Length Constraints
 IEEE Trans. Information Theory
, 2000
"... It is shown that for any uniquely decipherable code, with a small cost in the expected coding length we can add constraints on the worstcase coding length. Moreover, this cost is related to the Fibonacci numbers. Keywords: data compression, Fibonacci numbers, Hu#man codes, source coding, uniquely d ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
It is shown that for any uniquely decipherable code, with a small cost in the expected coding length we can add constraints on the worstcase coding length. Moreover, this cost is related to the Fibonacci numbers. Keywords: data compression, Fibonacci numbers, Hu#man codes, source coding, uniquely decipherable, universal coding. 1 Introduction A fundamental tradeo# in lossless source coding is that we can compress some of the inputs only if we expand some of the others. This is reasonable because our primary goal is to minimize the expected output coding length. However, in some cases we would not like to expand the data. The trivial code, wherein the output is equal to the input, never expands the coding length, but it never compresses either. A reasonable objective is to compress well, while expanding very little in the worstcase. The tradeo# between the expected coding length and the worstcase coding expansion has received research attention. In [1] an algorithm for finding a cod...
Twenty (or so) questions: boundedlength Huffman coding. arXiv:cs.IT/0602085
, 2006
"... Abstract — The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed which mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs f ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract — The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed which mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs from Twenty Questions in two ways: Answers need not be only “yes ” and “no, ” and the device continues to ask questions beyond the traditional twenty; typically, at least 20 and at most 25 questions are asked. The nonbinary variation on source coding is one that is well known and understood, but not with such bounds on length. An O(n(lmax − lmin))time O(n)space PackageMergebased algorithm is presented here for binary and nonbinary source coding with codeword lengths (numbers of questions) bounded to be within a certain interval, one that minimizes average codeword length or, more generally, any other quasiarithmetic convex coding penalty. In the case of minimizing average codeword length, both time and space complexity can be improved via an alternative reduction. This has, as a special case, a method for nonbinary lengthlimited Huffman coding, which was previously solved via dynamic programming with O(n 2 lmax log D) time and space. I.
Dary BoundedLength Huffman Coding
, 2007
"... Abstract — Efficient optimal prefix coding has long been accomplished via the Huffman algorithm. However, there is still room for improvement and exploration regarding variants of the Huffman problem. Lengthlimited Huffman coding, useful for many practical applications, is one such variant, in whic ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — Efficient optimal prefix coding has long been accomplished via the Huffman algorithm. However, there is still room for improvement and exploration regarding variants of the Huffman problem. Lengthlimited Huffman coding, useful for many practical applications, is one such variant, in which codes are restricted to the set of codes in which none of the n codewords is longer than a given length, lmax. Binary lengthlimited coding can be done in O(nlmax) time and O(n) space using the widely used PackageMerge algorithm. In this paper the PackageMerge approach is generalized in order to introduce a minimum codeword length, lmin, to allow for objective functions other than the minimization of expected codeword length, and to be applicable to both binary and nonbinary codes, the latter of which was previously addressed using a slower dynamic programming approach. These extensions have various applications — including faster decompression — and can be used to solve the problem of finding an optimal code with bounded fringe, that is, finding the best code among codes with a maximum difference between the longest and shortest codewords. The previously proposed method for solving this problem was nonpolynomial time, whereas the novel algorithm requires only O(n(lmax − lmin) 2) time and O(n) space. I.
REFERENCES
"... V. CONCLUSION We proposed an explicit construction of fixedlength codes for SW source networks. The proposed code is linear and has twostep encoding and decoding procedures similar to a concatenated code used for channel coding. Further, if the sources are memoryless, the proposed code is universa ..."
Abstract
 Add to MetaCart
V. CONCLUSION We proposed an explicit construction of fixedlength codes for SW source networks. The proposed code is linear and has twostep encoding and decoding procedures similar to a concatenated code used for channel coding. Further, if the sources are memoryless, the proposed code is universal and the probability of error vanishes exponentially as the block length tends to infinity. Regarding future research, we have the problem to obtain tight upper and lower bounds on the error exponent obtainable by the proposed code for DMSs.