## Bounding the Depth of Search Trees (1993)

Venue: | The Computer Journal |

Citations: | 16 - 5 self |

### BibTeX

@ARTICLE{Fraenkel93boundingthe,

author = {Aviezri S. Fraenkel and Shmuel T. Klein},

title = {Bounding the Depth of Search Trees},

journal = {The Computer Journal},

year = {1993},

volume = {36},

pages = {668--678}

}

### OpenURL

### Abstract

For an ordered sequence of n weights, Huffman's algorithm constructs in time and space O(n) a search tree with minimum average path length, or, which is equivalent, a minimum redundancy code. However, if an upper bound B is imposed on the length of the codewords, the best known algorithms for the construction of an optimal code have time and space complexities O(Bn 2 ). A new algorithm is presented, which yields sub-optimal codes, but in time O(n log n) and space O(n). Under certain conditions, these codes are shown to be close to optimal, and extensive experiments suggest that in many practical applications, the deviation from the optimum is negligible. 1. Motivation and Introduction We consider the set B(n; b) of extended binary trees with n leaves, labelled 1 to n, and with depth b, henceforth called b-restricted trees. An extended binary tree is a binary tree in which every internal node has two sons (here, and in what follows, we use the terminology of Knuth [16, pp. 399--...

### Citations

1087 |
A method for the construction of minimum redundancy codes
- HUFFMAN
- 1951
(Show Context)
Citation Context ...han B comparisons. The approach is recommended by Gilbert [8] for the case of inaccurately known probabilities w i : if some of the w i are significantly underestimated, Huffman's wellknown procedure =-=[13]-=- would assign long codewords to the corresponding elements and the code thus obtained may be fairly inefficient. Another possible application of bounding the depth of a tree is to reduce the external ... |

118 |
OptimumBinary Search Trees
- Knuth
- 1971
(Show Context)
Citation Context ...time and space complexity. Garey's algorithm is based on a procedure proposed by Gilbert & Moore [9] for alphabetical encodings, using time O(n 3 ). The latter pro-- 2 -- cedure was improved by Knuth =-=[15]-=- to O(n 2 ) in an application to optimum binary search trees, for which records can be stored also in internal nodes, but with no restriction on the depth of the tree. Garey shows how to extend Knuth'... |

115 |
Dynamic Huffman coding
- Knuth
- 1985
(Show Context)
Citation Context ...e and almost as efficient fixed length code. We have therefore decided to test the compression efficiency of the new method empirically on various "real-life" weight distributions, similarly=-= to Knuth [17]-=-, who checked his dynamic Huffman coding algorithm on, e.g., a file of Grimm's Fairy Tales. For any given set of n weights, the Huffman tree was built, with depth K. Using then Garey's algorithm, the ... |

85 |
The Art of Computer Programming, Vol I, Fundamental Algorithms
- Knuth
- 1968
(Show Context)
Citation Context ... path length L = P n i=1 l i , a quantity which appears in the complexity function of many algorithms. In the worst case, L is O(n 2 ) and on the average (with all trees equally likely) O(n p n) (see =-=[16]-=-), but imposing a bound B = O(log n) on the depth reduces L to be O(n log n). In [3] this approach is suggested to improve the space requirements of a method which allows efficient decoding of Huffman... |

56 |
On the construction of Huffman trees
- Leeuwen
- 1976
(Show Context)
Citation Context ...nipulations. When there is no bound imposed, or equivalently, when Bsn \Gamma 1, our problem is solved by Huffman's algorithm, which can be implemented in time O(n log n) (see for example Van Leeuwen =-=[21]-=-) and space O(n). In fact, the dominating part of the time complexity is sorting the weights w i , requiring time\Omega\Gamma n log n). If the weights are already given in order, the algorithm can be ... |

51 |
Variable-Length Binary Encodings
- Gilbert, Moore
- 1959
(Show Context)
Citation Context ...ime and space. A completely different dynamic programming solution is given by Garey [7] with O(Bn 2 ) time and space complexity. Garey's algorithm is based on a procedure proposed by Gilbert & Moore =-=[9]-=- for alphabetical encodings, using time O(n 3 ). The latter pro-- 2 -- cedure was improved by Knuth [15] to O(n 2 ) in an application to optimum binary search trees, for which records can be stored al... |

45 |
Two Inequalities Implied by Unique Decipherability
- McMillan
- 1956
(Show Context)
Citation Context ...lta \Deltaswn , and a bound Bsdlog 2 ne; the problem is to find a sequence of integers l i , which minimizes P n i=1 w i l i subject to the constraints l isB and n X i=1 2 \Gammal i = 1: (1) McMillan =-=[18]-=- has shown that the lengths l i of the binary codewords of any uniquely decipherable (UD) code C must satisfy P 2 \Gammal is1; the equality (1) is a sufficient condition for the completeness of the co... |

37 |
B.: Generating a canonical prefix encoding
- Schwartz, Kallick
- 1964
(Show Context)
Citation Context ...\Deltasl n ). Step 1b: Construct an extended binary tree in which the leaves are, in order from left to right, on levels l 1 ; : : : ; l n . An algorithm for Step 1b can be found in Schwartz & Kallik =-=[20]-=-. Alternatively, the tree can be generated in linear time by the procedure BUILD, which will be useful later. BUILD passes sequentially over the vector of lengths l i and simulates a depth first trave... |

31 |
Optimal Binary Search Trees with Restricted Maximal Depth
- Garey
- 1974
(Show Context)
Citation Context ...dynamic programming he solves the problem in O \Gamma (B \Gamma log 2 n) n 2 \Delta ; this bound applies for both time and space. A completely different dynamic programming solution is given by Garey =-=[7]-=- with O(Bn 2 ) time and space complexity. Garey's algorithm is based on a procedure proposed by Gilbert & Moore [9] for alphabetical encodings, using time O(n 3 ). The latter pro-- 2 -- cedure was imp... |

25 |
Codes Based on Inaccurate Source Probabilities
- Gilbert
- 1971
(Show Context)
Citation Context ...e; w i is the probability of record i being requested, and the problem is to minimize the average search time such that no search takes more than B comparisons. The approach is recommended by Gilbert =-=[8]-=- for the case of inaccurately known probabilities w i : if some of the w i are significantly underestimated, Huffman's wellknown procedure [13] would assign long codewords to the corresponding element... |

25 | Huffman codes and self-information - Katona, Nemetz |

23 |
Cryptanalysis: A study of ciphers and their solution
- Gaines
- 1956
(Show Context)
Citation Context ...tion using the database of the Responsa Retrieval Project (see for example Fraenkel [4]) of about 40 million Hebrew and Aramaic words; the distribution for Italian (26 letters) can be found in Gaines =-=[6]-=-, and for Russian (32 letters) in Herdan [11]. The results for this first set are summarized in Table 1. -- 15 -- Statistics 5 6 7 8 9 10 11 12 13 14 4.1852 opt 0.01 0.30 opt opt English 10 opt 0.01 o... |

22 |
Path Length of Binary Search Trees
- Hu, Tan
- 1972
(Show Context)
Citation Context ...th bounded depth. The solution proposed by Gilbert [8] is an exhaustive search through all the possible trees in B(n; B), which is not feasible for even moderately large values of n and B. Hu and Tan =-=[12]-=- provide a nonenumerative algorithm, in which, however, both time and space complexities grow exponentially with the bound B. A similar idea is used by Van Voorhis [22], but using dynamic programming ... |

17 |
Novel compression of sparse bit-strings, preliminary report
- Fraenkel, Klein
(Show Context)
Citation Context ...es correspond different codewords, and every run of 0-bytes is encoded by the codewords of the corresponding basis elements. The set of basis elements is a parameter; various choices are suggested in =-=[5]-=-, where this method is described in more detail. -- 17 -- Statistics 6 7 8 9 10 11 12 13 14 15 2.6757 14.1 opt 0.60 0.38 0.04 0.06 0.01 0.00 opt 0.00 PLI+ 16 opt opt opt 0.08 0.04 0.02 0.01 0.00 opt 0... |

17 |
The Advanced Theory of Language as Choice and Chance
- Herdan
- 1966
(Show Context)
Citation Context ...ieval Project (see for example Fraenkel [4]) of about 40 million Hebrew and Aramaic words; the distribution for Italian (26 letters) can be found in Gaines [6], and for Russian (32 letters) in Herdan =-=[11]-=-. The results for this first set are summarized in Table 1. -- 15 -- Statistics 5 6 7 8 9 10 11 12 13 14 4.1852 opt 0.01 0.30 opt opt English 10 opt 0.01 opt opt opt 4.0449 opt 0.96 1.69 0.70 0.45 0.1... |

15 |
Efficient variants of Huffman codes in high level languages
- Choueka, Klein, et al.
- 1985
(Show Context)
Citation Context ... many algorithms. In the worst case, L is O(n 2 ) and on the average (with all trees equally likely) O(n p n) (see [16]), but imposing a bound B = O(log n) on the depth reduces L to be O(n log n). In =-=[3]-=- this approach is suggested to improve the space requirements of a method which allows efficient decoding of Huffman codes without bit-manipulations. When there is no bound imposed, or equivalently, w... |

15 |
All about the Responsa retrieval project – what you always wanted to know but were afraid to ask
- Fraenkel
- 1976
(Show Context)
Citation Context ...auer & Goos [1]; for Hebrew (30 letters including two kinds of apostrophes and blank), we have computed the distribution using the database of the Responsa Retrieval Project (see for example Fraenkel =-=[4]-=-) of about 40 million Hebrew and Aramaic words; the distribution for Italian (26 letters) can be found in Gaines [6], and for Russian (32 letters) in Herdan [11]. The results for this first set are su... |

10 |
Le Vocabulaire de Jean Giraudoux”, Structure et Evolution, Geneve: Slatkine
- Brunet
- 1978
(Show Context)
Citation Context ...uages. The distribution of the 26 letters of English is in Heaps [10]; the distribution of the 29 letters of Finnish is from Pesonen [19]; the distribution for French (including blank) is from Brunet =-=[2]-=-; for German, the distribution of 30 letters (including blank and Umlaute) is given in Bauer & Goos [1]; for Hebrew (30 letters including two kinds of apostrophes and blank), we have computed the dist... |

10 |
Constructing codes with bounded codeword lengths
- Voorhis
- 1974
(Show Context)
Citation Context ...e values of n and B. Hu and Tan [12] provide a nonenumerative algorithm, in which, however, both time and space complexities grow exponentially with the bound B. A similar idea is used by Van Voorhis =-=[22]-=-, but using dynamic programming he solves the problem in O \Gamma (B \Gamma log 2 n) n 2 \Delta ; this bound applies for both time and space. A completely different dynamic programming solution is giv... |

2 |
Eine einfuhrende Ubersicht, Erster Teil
- Bauer, Goos, et al.
- 1973
(Show Context)
Citation Context ...rs of Finnish is from Pesonen [19]; the distribution for French (including blank) is from Brunet [2]; for German, the distribution of 30 letters (including blank and Umlaute) is given in Bauer & Goos =-=[1]-=-; for Hebrew (30 letters including two kinds of apostrophes and blank), we have computed the distribution using the database of the Responsa Retrieval Project (see for example Fraenkel [4]) of about 4... |

2 |
inflexions and their letter and syllable structure in Finnish newspaper text, Research Rep
- Pesonen, Word
(Show Context)
Citation Context ...ibutions of the characters of the alphabet for various natural languages. The distribution of the 26 letters of English is in Heaps [10]; the distribution of the 29 letters of Finnish is from Pesonen =-=[19]-=-; the distribution for French (including blank) is from Brunet [2]; for German, the distribution of 30 letters (including blank and Umlaute) is given in Bauer & Goos [1]; for Hebrew (30 letters includ... |