Results 1  10
of
15
Adding Compression to a FullText Retrieval System
, 1995
"... We describe the implementation of a data compression scheme as an integral and transparent layer within a fulltext... ..."
Abstract

Cited by 81 (25 self)
 Add to MetaCart
We describe the implementation of a data compression scheme as an integral and transparent layer within a fulltext...
Practical LengthLimited Coding for Large Alphabets
 The Computer Journal
, 1995
"... The use of Huffman coding for economical representation of a stream of symbols drawn from a defined source alphabet is widely known. In this paper we consider the problems encountered when Huffman coding is applied to an alphabet containing millions of symbols. Conventional treebased methods for ge ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
The use of Huffman coding for economical representation of a stream of symbols drawn from a defined source alphabet is widely known. In this paper we consider the problems encountered when Huffman coding is applied to an alphabet containing millions of symbols. Conventional treebased methods for generating the set of codewords require large amounts of main memory; and worse, the codewords generated may be longer than 32 bits, which can severely limit the usefulness of both software and hardware implementations. The solution to the second problem is to generate "lengthlimited" codes, but previous algorithms for this restricted problem have required even more memory space than Huffman's unrestricted method. Here we reexamine the "packagemerge" algorithm for generating optimal lengthlimited prefixfree codes and show that with a considered reorganisation of the key steps and careful attention to detail it is possible to implement it to run quickly in modest amounts of memory. As evid...
Is Huffman Coding Dead?
 Computing
, 1993
"... : In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
: In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance is often negligible. Referring also to other criteria, we conclude that for many applications, Huffman codes should still remain a competitive choice. 1. Introduction It is paradoxical that, as the technology for storing and transmitting information has gotten cheaper and more effective, interest in data compression has increased. There are many explanations, but most conspicuous is that improvements in media have expanded our sense of what we wish to store. For example, CDRom technology allows us to store whole libraries instead of records describing individual items; but the requirements of storing full text easily exceeds the capabilities even of the optical format. Similarly, there is ...
Improved Bounds on the Inefficiency of LengthRestricted Prefix Codes
 Departamento de Inform'atica, PUCRJ, Rio de
, 1997
"... : Consider an alphabet \Sigma = fa 1 ; : : : ; ang with corresponding symbol probabilities p 1 ; : : : ; pn . The L\Gammarestricted prefix code is a prefix code where all the code lengths are not greater than L. The value L is a given integer such that L dlog ne. Define the average code length dif ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
: Consider an alphabet \Sigma = fa 1 ; : : : ; ang with corresponding symbol probabilities p 1 ; : : : ; pn . The L\Gammarestricted prefix code is a prefix code where all the code lengths are not greater than L. The value L is a given integer such that L dlog ne. Define the average code length difference by ffl = P n i=1 p i :l i \Gamma P n i=1 p i :l i , where l 1 ; : : : ; l n are the code lengths of the optimal Lrestricted prefix code for \Sigma and l 1 ; : : : ; l n are the code lengths of the optimal prefix code for \Sigma. Let / be the golden ratio 1,618. In this paper, we show that ffl ! 1=/ L\Gammadlog(n+dlog ne\GammaL)e\Gamma1 when L ? dlog ne. We also prove the sharp bound ffl ! dlog ne \Gamma 1, when L = dlog ne. By showing the lower bound 1 / L\Gammadlog ne+2+dlog n n\GammaL e \Gamma1 on the maximum value of ffl, we guarantee that our bound is asymptotically tight in the range dlog ne ! L n=2. Furthermore, we present an O(n) time and space 1=/ L\Gammadlo...
The WARMUP Algorithm: A Lagrangean Construction of Length Restricted Huffman Codes
 Departamento de Inform'atica, PUCRJ, Rio de
, 1996
"... : Given an alphabet fa 1 ; : : : ; ang with corresponding set of weights fw 1 ; : : : ; wng, and a number L dlog ne, we introduce an O(n log n+n log w) algorithm for constructing a suboptimal prefix code with restricted maximal length L, where w is the highest presented weight. The number of additi ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
: Given an alphabet fa 1 ; : : : ; ang with corresponding set of weights fw 1 ; : : : ; wng, and a number L dlog ne, we introduce an O(n log n+n log w) algorithm for constructing a suboptimal prefix code with restricted maximal length L, where w is the highest presented weight. The number of additional bits per symbol generated by our code is not greater than 1=/ L\Gammadlog(n+dlog ne\GammaL)e\Gamma2 when L ? dlog ne + 1, where / is the golden ratio 1:618. An important feature of the proposed algorithm is its implementation simplicity. The algorithm is basically a selected sequence of Huffman trees construction for modified weights. Keywords: Prefix codes, Huffman Trees, Lagragean Duality Resumo: Dado um alfabeto fa 1 ; : : : ; ang com pesos correspondentes fw 1 ; : : : ; wng e um n'umero L dlog ne, n'os apresentamoso um algoritmo de de complexidade O(n log n + n log w)para construit c'odigos de prefixo sub'otimos com restric~ao de comprimento L, onde w 'e o maior peso do dado co...
Skeleton Trees for the Efficient Decoding of Huffman Encoded Texts
 Information Retrieval
, 1997
"... : A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log 2 n) for trees of depth O(log n), and decoding is faster, because a part of the bitcomparisons nec ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
: A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log 2 n) for trees of depth O(log n), and decoding is faster, because a part of the bitcomparisons necessary for the decoding may be saved. Empirical results on large reallife distributions show a reduction of up to 50% and more in the number of bit operations. The basic idea is then generalized, yielding further savings. This is an extended version of a paper which has been presented at the 8th Annual Symposium on Combinatorial Pattern Matching (CPM'97), and appeared in its proceedings, pp. 6575.  1  1.
Bidirectional Huffman Coding
, 1989
"... Under what conditions can Huffman codes be efficiently decoded in both directions? The usual decoding procedure works also for backward decoding only if the code has the affix property, i.e., both prefix and suffix properties. Some affix Huffman codes are exhibited, and necessary conditions for the ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Under what conditions can Huffman codes be efficiently decoded in both directions? The usual decoding procedure works also for backward decoding only if the code has the affix property, i.e., both prefix and suffix properties. Some affix Huffman codes are exhibited, and necessary conditions for the existence of such codes are given. An algorithm is presented which, for a given set of codeword lengths, constructs an affix code, if there exists one. Since for many distributions there is no affix code giving the same compression as the Huffman code, a new algorithm for backward decoding of nonaffix Huffman codes is presented, and its worst case complexity is proved to be linear in the length of the encoded text. 1. Introduction For a given sequence of n weights w 1 ; : : : ; wn , with w i ? 0, Huffman's wellknown algorithm [9] constructs an optimum prefix code. We use throughout the term `code' as abbreviation for `set of codewords'. In a prefix code no codeword is the prefix of any o...
Efficient Implementation of the WARMUP Algorithm for the Construction of LengthRestricted Prefix Codes
 in Proceedings of the ALENEX
, 1999
"... . Given an alphabet \Sigma = fa1 ; : : : ; ang with a corresponding list of positive weights fw1 ; : : : ; wng and a length restriction L, the lengthrestricted prefix code problem is to find, a prefix code that minimizes P n i=1 w i l i , where l i , the length of the codeword assigned to a i , ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
. Given an alphabet \Sigma = fa1 ; : : : ; ang with a corresponding list of positive weights fw1 ; : : : ; wng and a length restriction L, the lengthrestricted prefix code problem is to find, a prefix code that minimizes P n i=1 w i l i , where l i , the length of the codeword assigned to a i , cannot be greater than L, for i = 1; : : : ; n. In this paper, we present an efficient implementation of the WARMUP algorithm, an approximative method for this problem. The worstcase time complexity of WARMUP is O(n log n +n log wn ), where wn is the greatest weight. However, some experiments with a previous implementation of WARMUP show that it runs in linear time for several practical cases, if the input weights are already sorted. In addition, it often produces optimal codes. The proposed implementation combines two new enhancements to reduce the space usage of WARMUP and to improve its execution time. As a result, it is about ten times faster than the previous implementat...
Practical Use of The Warmup Algorithm on LengthRestricted Coding
 the Proceedings of the Fourth Latin American Workshop on String Processing
, 1997
"... . In this paper we present an efficient implementation of the WARMUP Algorithm for the construction of lengthrestricted prefix codes. This algorithm has O(n log n + n log wn) worst case time complexity, where n is the number of symbols of the source alphabet and wn is the largest weight of the ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
. In this paper we present an efficient implementation of the WARMUP Algorithm for the construction of lengthrestricted prefix codes. This algorithm has O(n log n + n log wn) worst case time complexity, where n is the number of symbols of the source alphabet and wn is the largest weight of the alphabet. An important feature of the proposed algorithm is its implementation simplicity. The algorithm is basically a selected sequence of Huffman trees construction for modified weights. The proposed implementation has the same time complexity, but requires only additional O(1) space. We also report some empirical experiments showing that this algorithm provides good compression and speed performances. 1 Introduction An important problem in the field of Coding and Information Theory is the Binary Prefix Code Problem. Given an alphabet \Sigma = fa 1 ; : : : ; ang and a corresponding set of positive weights fw 1 ; : : : ; wng, the problem is to find a prefix code for \Sigma that mi...
Dynamic LengthRestricted Coding
, 2003
"... Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character in $S$