## Huffman coding with unequal letter costs (Extended Abstract) (2002)

### Cached

### Download Links

Venue: | IN: PROCEEDINGS OF THE THIRY-FOURTH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, ACM |

Citations: | 13 - 4 self |

### BibTeX

@INPROCEEDINGS{Golin02huffmancoding,

author = {Mordecai J. Golin and Claire Kenyon and Neal E. Young},

title = {Huffman coding with unequal letter costs (Extended Abstract)},

booktitle = {IN: PROCEEDINGS OF THE THIRY-FOURTH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, ACM},

year = {2002},

pages = {785--791},

publisher = {Press}

}

### OpenURL

### Abstract

In the standard Huffman coding problem, one is given a set of words and for each word a positive frequency. The goal is to encode each word w as a codeword c(w) over a given alphabet. The encoding must be prefixfree (no codeword is a prefixof any other) and should minimize the weighted average codeword size � w freq(w) |c(w)|. The problem has a well-known polynomial-time algorithm due to Huffman [15]. Here we consider the generalization in which the letters of the encoding alphabet may have non-uniform lengths. The goal is to minimize the weighted average codeword length w freq(w) cost(c(w)), where cost(s) is the sum of the (pos-sibly non-uniform) lengths of the letters in s. Despitemuch previous work, the problem is not known to be NP-hard, nor was it previously known to have a polynomial-time approximation algorithm. Here we describe a polynomial-time approximation scheme (PTAS) for the problem.

### Citations

941 |
A method for the construction of minimumredundancy codes
- Huffman
- 1952
(Show Context)
Citation Context ... be prefix free (no codeword is a prefix of any other) and should minimize the weighted average codeword size ∑ w freq(w) |c(w)|. The problem has a well-known polynomial-time algorithm due to Huffman =-=[15]-=-. Here we consider the generalization in which the letters of the encoding alphabet may have non-uniform lengths. The goal ∑ is to minimize the weighted average codeword length w freq(w) cost(c(w)), w... |

128 | The Art of Computer Programming, Volume III: Sorting and Searching - Knuth - 1973 |

55 | Code and parse trees for lossless source encoding - Abrahams - 1997 |

41 |
Optimal alphabetic trees
- Itai
- 1976
(Show Context)
Citation Context ... upon the outcome of the test [20, 6.2.2, ex. 33] and has also been studied under the names dichotomous search [14] or the leaky shower problem [18]. Alphabetic coding has a polynomial-time algorithm =-=[17]-=-. 2. NOTATIONS AND DEFINITIONS A problem instance is specified by a set W of n words with associated frequencies p1 ≥ p2 ≥ · · · ≥ pn > 0, an alphabet Σ of r ≥ 2 letters with associated costs ℓ1 ≤ ℓ2 ... |

39 |
Minimum-redundancy coding for the discrete noiseless channel
- Karp
- 1961
(Show Context)
Citation Context ...b} in which the length of a “a” is 1 and the length of a “b” is 3. This generalization is motivated by coding problems in which different characters have different transmission times or storage costs =-=[5; 22; 19; 28; 29]-=-. One example is the telegraph channel [10; 11] in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and o... |

36 | A dynamic programming algorithm for constructing optimal prefix-free codes for unequal costs
- Golin, Rote
- 1998
(Show Context)
Citation Context ...s the telegraph channel [10; 11] in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and optical storage =-=[16; 12]-=-, in which the codewords are binary and constrained so that each 1 must be preceded by at least a, and at most b, 0’s. (This example can be modeled by the problem studied here by using an encoding alp... |

24 |
Channels which transmit letters of unequal duration
- Krause
(Show Context)
Citation Context ... solution (assuming the letter costs are integers); Karp’s algorithm transforms the problem into an integer program and does not run in polynomial time [19]. Karp’s result was followed by many others =-=[21; 9; 8; 23; 3]-=- presenting solutions of cost at most OPT + f(ℓ1, ℓ2, . . . , ℓr) where OPT is the cost of the optimal code and f(ℓ1, ℓ2, . . . , ℓr) is some fixed function of the edge costs, with the different algor... |

24 |
Optimal variable length codes (arbitrary symbol cost and equal code word probability
- Varn
- 1971
(Show Context)
Citation Context ...b} in which the length of a “a” is 1 and the length of a “b” is 3. This generalization is motivated by coding problems in which different characters have different transmission times or storage costs =-=[5; 22; 19; 28; 29]-=-. One example is the telegraph channel [10; 11] in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and o... |

20 |
Optimum lopsided binary trees
- Kapoor, Reingold
- 1989
(Show Context)
Citation Context ... procedures in which the time required by a test depends upon the outcome of the test [20, 6.2.2, ex. 33] and has also been studied under the names dichotomous search [14] or the leaky shower problem =-=[18]-=-. Alphabetic coding has a polynomial-time algorithm [17]. 2. NOTATIONS AND DEFINITIONS A problem instance is specified by a set W of n words with associated frequencies p1 ≥ p2 ≥ · · · ≥ pn > 0, an al... |

19 | Codes for Mass Data Storage Systems, Shannon Foundation - Immink - 1999 |

18 |
Coding with digits of unequal costs
- Gilbert
- 1995
(Show Context)
Citation Context ...b” is 3. This generalization is motivated by coding problems in which different characters have different transmission times or storage costs [5; 22; 19; 28; 29]. One example is the telegraph channel =-=[10; 11]-=- in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and optical storage [16; 12], in which the codewords... |

17 |
Simple Proofs of Some Theorems on Noiseless Channels
- Csiszar
(Show Context)
Citation Context ... solution (assuming the letter costs are integers); Karp’s algorithm transforms the problem into an integer program and does not run in polynomial time [19]. Karp’s result was followed by many others =-=[21; 9; 8; 23; 3]-=- presenting solutions of cost at most OPT + f(ℓ1, ℓ2, . . . , ℓr) where OPT is the cost of the optimal code and f(ℓ1, ℓ2, . . . , ℓr) is some fixed function of the edge costs, with the different algor... |

17 |
Efficient generation of optimal prefix code: Equiprobable words using unequal cost letters
- Perl, Garey, et al.
- 1975
(Show Context)
Citation Context ...tains many algorithms for the generalized problem. The special case when all the probabilities are equal (but not the letter lengths), known as the Varn coding problem, is solvable in polynomial-time =-=[29; 1; 7; 25; 13; 6]-=-. For the generalized problem, Blachman [5], Marcus [22], and (much later) Gilbert [11] give heuristic constructions. Karp gave the first algorithm yielding an exact solution (assuming the letter cost... |

14 | Optimal prefixfree codes for unequal letter costs: Dynamic programming with the Monge property
- Bradford, Golin, et al.
- 2002
(Show Context)
Citation Context ...etter costs. Golin and Rote [12] gave a dynamic programming algorithm that produces exact solutions in O(n ℓr+2 ) time for the special case when the ℓi are restricted to be integers. Bradford et. al. =-=[24]-=- improved this 10 1 2 3 0 1 0 1 0 1 2 00 2 01 1 10 1 11 2 a aaa a b a b b 2 b 4 1 ab 5 1 aab Figure 1: Two minimum-cost codes for the frequencies (p1, p2, p3, p4) = (2,2, 1,1) but under different alp... |

12 |
Tree structures for optimal searching
- Stanfel
- 1970
(Show Context)
Citation Context ...b} in which the length of a “a” is 1 and the length of a “b” is 3. This generalization is motivated by coding problems in which different characters have different transmission times or storage costs =-=[5; 22; 19; 28; 29]-=-. One example is the telegraph channel [10; 11] in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and o... |

12 | Optimum 1-ended binary prefix codes - Berger, Yeung - 1990 |

11 | Prefix codes: Equiprobable words, unequal letter costs
- Golin, Young
- 1996
(Show Context)
Citation Context ...tains many algorithms for the generalized problem. The special case when all the probabilities are equal (but not the letter lengths), known as the Varn coding problem, is solvable in polynomial-time =-=[29; 1; 7; 25; 13; 6]-=-. For the generalized problem, Blachman [5], Marcus [22], and (much later) Gilbert [11] give heuristic constructions. Karp gave the first algorithm yielding an exact solution (assuming the letter cost... |

10 |
An algorithm for optimal prefix parsing of a noiseless and memoryless channel
- Lempel, Even, et al.
- 1973
(Show Context)
Citation Context ...tains many algorithms for the generalized problem. The special case when all the probabilities are equal (but not the letter lengths), known as the Varn coding problem, is solvable in polynomial-time =-=[29; 1; 7; 25; 13; 6]-=-. For the generalized problem, Blachman [5], Marcus [22], and (much later) Gilbert [11] give heuristic constructions. Karp gave the first algorithm yielding an exact solution (assuming the letter cost... |

10 |
An efficient algorithm for constructing nearly optimal prefix codes
- Mehlhorn
- 1980
(Show Context)
Citation Context ... solution (assuming the letter costs are integers); Karp’s algorithm transforms the problem into an integer program and does not run in polynomial time [19]. Karp’s result was followed by many others =-=[21; 9; 8; 23; 3]-=- presenting solutions of cost at most OPT + f(ℓ1, ℓ2, . . . , ℓr) where OPT is the cost of the optimal code and f(ℓ1, ℓ2, . . . , ℓr) is some fixed function of the edge costs, with the different algor... |

9 |
Binary prefix codes ending in a “1
- Capocelli, Santis, et al.
- 1994
(Show Context)
Citation Context ...words belong to a given regular language L. As one example, the binary codes, constrained so all codewords must end in a 1, are used for group testing and the construction of self-synchronizing codes =-=[4; 26]-=-. As another example, binary codes whose codewords contain at most a specified number of 1’s are used for energy minimization of transmissions in mobile environments [27]. Algorithms (other than exhau... |

8 |
Complexity of the variable-length encoding problem
- Cot
- 1975
(Show Context)
Citation Context |

8 |
The sound of silence: Guessing games for saving energy in mobile environment
- Korach, Dolov, et al.
- 1999
(Show Context)
Citation Context ...f self-synchronizing codes [4; 26]. As another example, binary codes whose codewords contain at most a specified number of 1’s are used for energy minimization of transmissions in mobile environments =-=[27]-=-. Algorithms (other than exhaustive search) for the regularlanguage prefix-coding problem generalize [12] and run in time n Θ(S(L)) where S(L) is the number of states in the smallest deterministic fin... |

7 |
Discrete Noiseless Coding
- Marcus
- 1957
(Show Context)
Citation Context |

6 |
Minimum cost coding of information
- Blachman
- 1954
(Show Context)
Citation Context |

6 |
Lopsided trees: Algorithms, analyses and applications
- Choi, Golin
- 1996
(Show Context)
Citation Context |

5 |
Codes: Unequal probabilies, unequal letter costs
- Altenkamp, Melhorn
- 1980
(Show Context)
Citation Context |

5 |
On dichotomous search with direction-dependent costs for a uniformly hidden object
- Hinderer
- 1990
(Show Context)
Citation Context ...oblem arises in designing testing procedures in which the time required by a test depends upon the outcome of the test [20, 6.2.2, ex. 33] and has also been studied under the names dichotomous search =-=[14]-=- or the leaky shower problem [18]. Alphabetic coding has a polynomial-time algorithm [17]. 2. NOTATIONS AND DEFINITIONS A problem instance is specified by a set W of n words with associated frequencie... |

4 |
How good is morse code
- Gilbert
- 1969
(Show Context)
Citation Context ...b” is 3. This generalization is motivated by coding problems in which different characters have different transmission times or storage costs [5; 22; 19; 28; 29]. One example is the telegraph channel =-=[10; 11]-=- in which Σ = {·, −} and ℓ2 = 2ℓ1, i.e., in which dashes are twice as long as dots. Another is the (a, b) run-length-limited codes used in magnetic and optical storage [16; 12], in which the codewords... |

3 |
Characterization and Design of Optimal Prefix Codes
- Cott
- 1977
(Show Context)
Citation Context |

1 | Optimum 1-ended binary prefixcodes - Berger, Yeung - 1990 |

1 | Prefixcodes: Equiprobable words, unequal letter costs - Golin, Young - 1996 |

1 | Binary prefixcodes ending in a 1 - Capocelli, Persiano - 1994 |

1 | How good is morse code. Inform Control - Gilbert - 1969 |