## An Experimental Study of Compression Methods for Functional Tries (1999)

Venue: | in: Workshop on Algorithmic Aspects of Advanced Programming Languages (WAAAPL'99 |

Citations: | 6 - 0 self |

### BibTeX

@INPROCEEDINGS{Iivonen99anexperimental,

author = {Jukka-pekka Iivonen and Stefan Nilsson and Matti Tikkanen},

title = {An Experimental Study of Compression Methods for Functional Tries},

booktitle = {in: Workshop on Algorithmic Aspects of Advanced Programming Languages (WAAAPL'99},

year = {1999}

}

### OpenURL

### Abstract

We develop compression methods for functional tries and study them experimentally. Trie compression is usually implemented either as a combination of path compression and width compression or as a combination of path compression and level compression. We develop a new efficient implementation for width compression and show that in functional tries the combination of all of the three compression methods yields the best results when taking into account the trie size, the trie depth, the copy cost, and the search and update performance. We also show that the 2-3 tree minimizes the copy cost in balanced trees and compare our experimental results for tries to approximate analytical results for internal and external 2-3 trees. Our conclusion is that the path-, width-, and level-compressed trie is an ideal choice for a functional main-memory index structure. Keywords functional data structures, imperative data structures, trie, shadowing, path copying, path compression, width compression, level compression 1

### Citations

474 | Literate Programming
- Knuth
- 1984
(Show Context)
Citation Context ... strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. There are several methods for implementing trie structures in the literature =-=[4, 5, 7, 16, 19]-=-. One of the drawbacks of these methods is that they need considerably more memory than a balanced search tree due to empty subtrees. We show how to avoid this problem by choosing a small alphabet and... |

280 |
Trie memory
- Fredkin
- 1960
(Show Context)
Citation Context ...ry cell holding the copy is returned. In recursive structures such as trees an update leads to shadowing the entire search path. This is also known as path copying [26]. In its original form the trie =-=[11, 12]-=- is a data structure where a set of strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. There are several methods for implementing... |

259 |
Sorting and searching, volume 3 of The Art of Computer Programming
- Knuth
- 1973
(Show Context)
Citation Context ...s. We show how to avoid this problem by choosing a small alphabet and applying trie compression. 101 The average-case behavior of trie structures has been the subject of thorough theoretical analysis =-=[10, 17, 24, 25]-=-; an extensive list of references can be found in Handbook of Theoretical Computer Science [18]. The expected average depth of a trie containing n independent random strings from a distribution with d... |

243 | Purely Functional Data Structures
- Okasaki
- 1998
(Show Context)
Citation Context ...f a functional binary trie and uses the trie for introducing sets and maps into a functional set-oriented programming language. He does not apply any compression methods to the trie, however. Okasaki =-=[22]-=- shows how maps are used in a functional programming language to implement tries. Even though maps can be viewed as a means to implement width compression, they do not bring about the savings in memor... |

200 |
The Benchmark Handbook for Database and Transaction Processing Systems
- Gray
- 1993
(Show Context)
Citation Context ...ructure should have a direct impact on the overall performance. Because our measurements did not take this cumulative cost of copying into account, we ran a main-memory variant of the TPC-B benchmark =-=[14]-=- to investigate the effect of the collection of mature generations using Patricia trees, path- and width-compressed quaternary tries, and path-, width- and levelcompressed tries. In the tests we did n... |

171 | Planar point location using persistent search trees
- Sarnak, Tarjan
- 1986
(Show Context)
Citation Context ...py, and last a reference to the memory cell holding the copy is returned. In recursive structures such as trees an update leads to shadowing the entire search path. This is also known as path copying =-=[26]-=-. In its original form the trie [11, 12] is a data structure where a set of strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. Th... |

166 |
Baeza-Yates, "Handbook of Algorithms and Data Structures
- Gonnet, A
- 1991
(Show Context)
Citation Context ...ry cell holding the copy is returned. In recursive structures such as trees an update leads to shadowing the entire search path. This is also known as path copying [26]. In its original form the trie =-=[11, 12]-=- is a data structure where a set of strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. There are several methods for implementing... |

86 | Fast address lookup for Internet routers
- Nilsson, Karlsson
- 1997
(Show Context)
Citation Context ...thmic depth of uncompressed and path compressed tries. These results also translate to good performance in practice, as shown by a recent software implementation of IP routing tables using an LC-trie =-=[20]-=-. 102 The LC-trie was originally a static data structure that did not support inserts and deletes. To our knowledge, the LPC-trie [21] was the first dynamic variant of the LC-trie. The LPC-trie is an ... |

64 |
Digital search trees revisited
- Flajolet, Sedgewick
- 1986
(Show Context)
Citation Context ...s. We show how to avoid this problem by choosing a small alphabet and applying trie compression. 101 The average-case behavior of trie structures has been the subject of thorough theoretical analysis =-=[10, 17, 24, 25]-=-; an extensive list of references can be found in Handbook of Theoretical Computer Science [18]. The expected average depth of a trie containing n independent random strings from a distribution with d... |

60 |
Asymptotic growth of a class of random trees
- Pittel
- 1985
(Show Context)
Citation Context ...s. We show how to avoid this problem by choosing a small alphabet and applying trie compression. 101 The average-case behavior of trie structures has been the subject of thorough theoretical analysis =-=[10, 17, 24, 25]-=-; an extensive list of references can be found in Handbook of Theoretical Computer Science [18]. The expected average depth of a trie containing n independent random strings from a distribution with d... |

55 |
Paths in a random digital tree: Limiting distributions
- Pittel
- 1986
(Show Context)
Citation Context |

39 |
On the performance evaluation of extendible hashing and trie searching
- Flajolet
- 1983
(Show Context)
Citation Context ...pected average depth of a trie containing n independent random strings from a distribution with density function f 2 L 2 is (log n) [6]. This result holds also for data from a Bernoulli-type process [=-=8, 9]-=-. The best known compression technique for tries is path compression. The idea is simple: paths consisting of a sequence of single-child nodes are compressed, as shown in Figure 1b. A path compressed ... |

33 | Improved behaviour of tries by adaptive branching
- Andersson, Nilsson
- 1993
(Show Context)
Citation Context ...rings are stored in the leaves. The labels 0 14 indicate the presence of an element, not a value stored. Variable s gives the longest common prefix string of a path-compressed node. Level compression =-=[1]-=- is a more recent technique. Once again, the idea is simple: complete nodes (all children are non-empty) are compressed, and this compression is performed top down, see Figure 1c. The level-compressed... |

26 |
A note on the average depth of tries
- Devroye
- 1982
(Show Context)
Citation Context ...n be found in Handbook of Theoretical Computer Science [18]. The expected average depth of a trie containing n independent random strings from a distribution with density function f 2 L 2 is (log n) [=-=6]-=-. This result holds also for data from a Bernoulli-type process [8, 9]. The best known compression technique for tries is path compression. The idea is simple: paths consisting of a sequence of single... |

19 | An efficient implementation of trie structures
- Aoe, Morimoto, et al.
- 1992
(Show Context)
Citation Context ... strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. There are several methods for implementing trie structures in the literature =-=[4, 5, 7, 16, 19]-=-. One of the drawbacks of these methods is that they need considerably more memory than a balanced search tree due to empty subtrees. We show how to avoid this problem by choosing a small alphabet and... |

16 | A Combinatorial Framework for Map Labeling
- Wagner, Wolff
- 1998
(Show Context)
Citation Context ...47001325; 715136305; 0) (Table 1). The other data sets are real data: binary strings from Internet routing tables (Table 2), two-dimensional geometric point data (Table 3) visualized in Figure 6 from =-=[27]-=-, and ASCII strings from the Calgary Text Compression Corpus, which is a standardized text corpus used frequently in data compression research (Table 4). The performance tests were measured in the Lin... |

14 |
Algebraic Methods for Trie Statistics
- Flajolet, Regnier, et al.
- 1985
(Show Context)
Citation Context ...pected average depth of a trie containing n independent random strings from a distribution with density function f 2 L 2 is (log n) [6]. This result holds also for data from a Bernoulli-type process [=-=8, 9]-=-. The best known compression technique for tries is path compression. The idea is simple: paths consisting of a sequence of single-child nodes are compressed, as shown in Figure 1b. A path compressed ... |

13 | Some further results on digital search trees
- Kirschenhofer, Prodinger
- 1986
(Show Context)
Citation Context ...e the size of the trie dramatically. In fact, the number of nodes in a path compressed binary trie storing ¡ keys is ��¡���� . The asymptotic expected average depth, however, is typically not reduced =-=[15, 17]-=-. a. b. c. � � � � � � � � � � � � � � � � � � � � ��� ��� � ��� ��� � ��� ��������������� ��� ��� � � ��� ������� ��� � � � � ��� ��� ��� ��������������� ��� Figure 1: A binary trie (a), a path-compr... |

12 | Faster searching in tries and quadtrees- an analysis of level compression
- Andersson, Nilsson
- 1994
(Show Context)
Citation Context ...pressed trie, LC-trie, has proved to be of interest both in theory and practice. It is known that the average expected depth of an LC-trie is O(log log n) for data from a large class of distributions =-=[2]-=-. This should be compared to the logarithmic depth of uncompressed and path compressed tries. These results also translate to good performance in practice, as shown by a recent software implementation... |

12 | Bonsai: a compact representation of trees
- Darragh, Cleary, et al.
- 1993
(Show Context)
Citation Context ... strings from an alphabet containing m characters is stored in an m-ary tree and each string corresponds to a unique path. There are several methods for implementing trie structures in the literature =-=[4, 5, 7, 16, 19]-=-. One of the drawbacks of these methods is that they need considerably more memory than a balanced search tree due to empty subtrees. We show how to avoid this problem by choosing a small alphabet and... |

11 | Implementing a Dynamic Compressed Trie
- Nilsson, Karlsson
- 1998
(Show Context)
Citation Context ...ent software implementation of IP routing tables using an LC-trie [20]. 102 The LC-trie was originally a static data structure that did not support inserts and deletes. To our knowledge, the LPC-trie =-=[21]-=- was the first dynamic variant of the LC-trie. The LPC-trie is an imperative data structure, however. In functional tries, an update leads to copying the entire search path from the root to the leaf s... |

9 | HimML: Standard ML with fast sets and maps
- Goubault
- 1994
(Show Context)
Citation Context ...aternary nodes to reduce the cost of copying. As before, the labels 0 14 only indicate the presence of an element. Compression methods for functional tries are hardly discussed in literature. Goubalt =-=[13]-=- describes a system-level implementation of a functional binary trie and uses the trie for introducing sets and maps into a functional set-oriented programming language. He does not apply any compress... |

9 | Further results on digital search tree
- Kirschenhofer, Prodinger
- 1988
(Show Context)
Citation Context ...educe the size of the trie dramatically. In fact, the number of nodes in a path compressed binary trie storing n keys is 2n 1. The asymptotic expected average depth, however, is typically not reduced =-=[15, 17]-=-. 13 13 b. c. 0 1 2 3 4 9 11 14 10 7 8 6 5 12 0 1 2 3 4 9 14 10 11 8 7 5 6 12 s = 11101001 s = 11101001 8 7 a. 0 4 9 11 5 6 10 12 3 2 1 13 14 Figure 1: A binary trie (a), a path-compressed binary trie... |

8 | Implementing dynamic minimal-prefix tries, 6RIWZDUH 3UDFWLFH
- Dundas
- 1991
(Show Context)
Citation Context |

8 | Real-time Garbage Collection of a Functional Persistent Heap
- Oksanen
- 1999
(Show Context)
Citation Context ...ilt-in maps and sets that are implemented by persistent functional tries. In order to obtain maximal computational efficiency, we have implemented our functional data structures in C on top of Shades =-=[23]-=-, a persistent functional heap supporting automatic disk-backed memory management in a soft real-time environment. Shades replaces logging with shadowing combined with real-time generational stop-and-... |

5 | A method of compressing trie structures
- Morimoto, Iriguchi, et al.
- 1994
(Show Context)
Citation Context |

1 |
Approximate average storage utilization of bucket methods with arbitrary fanout
- Ang, Samet
- 1996
(Show Context)
Citation Context ...r ln 2 in computing the copy cost and storage requirements. The scale factor ln 2 gives the approximate average storage utilization of hierarchical access methods based on the binary splitting policy =-=[3]-=-. All the formulas used for computing the values are given in the appendix. It is clear that approximations cannot be taken for measured results. For balanced trees, however, approximations give a fai... |

1 |
Leeuwen (ed.). Algorithms and Complexity, volume A
- van
- 1990
(Show Context)
Citation Context ...erage-case behavior of trie structures has been the subject of thorough theoretical analysis [10, 17, 24, 25]; an extensive list of references can be found in Handbook of Theoretical Computer Science =-=[18-=-]. The expected average depth of a trie containing n independent random strings from a distribution with density function f 2 L 2 is (log n) [6]. This result holds also for data from a Bernoulli-type ... |