## Confluently Persistent Tries for Efficient Version Control

Citations: | 1 - 1 self |

### BibTeX

@MISC{Demaine_confluentlypersistent,

author = {Erik D. Demaine and Stefan Langerman and Eric Price},

title = {Confluently Persistent Tries for Efficient Version Control},

year = {}

}

### OpenURL

### Abstract

Abstract. We consider a data-structural problem motivated by version control of a hierarchical directory structure in a system like Subversion. The model is that directories and files can be moved and copied between two arbitrary versions in addition to being added or removed in an arbitrary version. Equivalently, we wish to maintain a confluently persistent trie (where internal nodes represent directories, leaves represent files, and edge labels represent path names), subject to copying a subtree between two arbitrary versions, adding a new child to an existing node, and deleting an existing subtree in an arbitrary version. Our first data structure represents an n-node degree- ∆ trie with O(1) “fingers ” in each version while supporting finger movement (navigation) and modifications near the fingers (including subtree copy) in O(lg ∆) time and space per operation. This data structure is essentially a locality-sensitive version of the standard practice—path copying— costing O(d lg ∆) time and space for modification of a node at depth d, which is expensive when performing many deep but nearby updates. Our second data structure supporting finger movement in O(lg ∆) time and no space, while modifications take O(lg n) time and space. This data structure is substantially faster for deep updates, i.e., unbalanced tries. Both of these data structures are functional, which is a stronger property than confluent persistence. Without this stronger property, we show how both data structures can be sped up to support movement in O(lg lg ∆), which is essentially optimal. Along the way, we present a general technique for global rebuilding of fully persistent data structures, which is nontrivial because amortization and persistence do not usually mix. In particular, this technique improves the best previous result for fully persistent arrays and obtains the first efficient fully persistent hash table. 1

### Citations

8835 | Introduction to algorithms - Cormen, Leiserson, et al. - 1990 |

249 | Making data structures persistent
- Driscoll, Sarnak, et al.
- 1989
(Show Context)
Citation Context ...uires the ability to copy subdirectories (subtrees) from one version into another. Related work in persistence. Partial and full persistence were mostly solved in the 1980’s. In 1986, Driscoll et al. =-=[DSST89]-=- developed a technique that converts any pointer-based data structure with bounded in-degree into an equivalent fully persistent data structure with only constant-factor overhead in time and space for... |

234 | Purely Functional Data Structures
- Okasaki
- 1999
(Show Context)
Citation Context ...ta structure and the best functional data structure for some problems. On the other hand, many common data structures can be implemented functionally with only a constant-factor overhead; see Okasaki =-=[Oka98]-=-. One example we use frequently is a functional catenable deque, supporting insertion and deletion at either end and concatenation of two deques in constant time per operation [Oka98]. Path copying. P... |

173 | Planar point location using persistent search trees - Sarnak, Tarjan - 1986 |

109 |
A Large-Scale Study of FileSystem Content
- Douceur, Bolosky
(Show Context)
Citation Context ... extremely unbalanced. For example, if the trie is a path of length n, and we repeatedly insert n leaves at the bottommost node, then path copying requires Ω(n 2 ) time and space. Douceur and Bolosky =-=[DB99]-=- studied over 10,000 file systems from nearly 5,000 Windows PCs in a commercial environment, totaling 140 million files and 10.5 terabytes. They found that d roughly follows a Poisson distribution, wi... |

85 | A.: Incremental Context-Dependent Analysis for Language-Based Editors - Reps, Teitelbaum, et al. - 1983 |

50 | Dynamic models for file sizes and double Pareto distributions
- Mitzenmacher
(Show Context)
Citation Context ... commercial environment, totaling 140 million files and 10.5 terabytes. They found that d roughly follows a Poisson distribution, with 15% of all directories having depth at least eight. Mitzenmacher =-=[Mit03]-=- studies a variety of theoretical models for file-system creation which all imply that d is usually logarithmic in n. Our results. We develop four trie data structures, two of which are functional and... |

42 |
Alternatives to splay trees with O(log n) worst-case access times
- Iacono
(Show Context)
Citation Context ...en in the globally balanced functional data structure. This data structure uses the fully persistent hash tables from the previous section combined with ideas from the working-set structure of Iacono =-=[Iac01]-=-. Theorem 8 There is a fully persistent weight-balanced hash table storing n elements, each with a key and a weight, subject to looking up an element by key in O(lg lg r) time, where r is the number o... |

41 | Universal classes of hash functions (extended abstract - Carter, Wegman - 1977 |

36 | Efficient applicative data types - Myers - 1984 |

32 |
Fully persistent arrays
- Dietz
- 1989
(Show Context)
Citation Context ...rts any pointer-based data structure with bounded in-degree into an equivalent fully persistent data structure with only constant-factor overhead in time and space for every operation. In 1989, Dietz =-=[Die89]-=- developed a fully persistent array supporting random access in O(lg lg m) time, where m is the number of updates made to any version. This data structure enables simulation of an arbitrary RAM data s... |

25 |
Persistent lists with catenation via recursive slow-down
- Kaplan, Tarjan
- 1995
(Show Context)
Citation Context ...ized, but in a way that permits confluent usage if one allows memoization. Furthermore, Kaplan and Tarjan have shown a complicated O(1) worst case purely functional implementation of catenable deques =-=[KT95]-=-. In order to implement predecessor and successor queries, decompose the path in the representation tree into a sequence of right paths (sequence of nodes where the next element on the path is the rig... |

22 |
Fully persistent lists with catenation
- Driscoll, Sleator, et al.
- 1994
(Show Context)
Citation Context ...Ω(lg lg n) time per operation in the powerful cell-probe model. 5 More relevant is the work on confluent persistence. The idea was first posed as an open problem by [DSST89]. In 1994, Driscoll et al. =-=[DST94]-=- defined confluence and gave a specific data structure for confluently persistent catenable lists. In 2003, Fiat and Kaplan [FK03] developed the first and only general methods for making a pointer-bas... |

17 | Pure versus Impure Lisp
- Pippenger
- 1997
(Show Context)
Citation Context ...ewly created cell. Functional data structures have many useful properties other than confluent persistence; for example, multiple threads can use functional data structures without locking. Pippenger =-=[Pip97]-=- proved a logarithmic-factor separation between the best pointer-based data structure and the best functional data structure for some problems. On the other hand, many common data structures can be im... |

17 |
Sleator and Robert Endre Tarjan. A data structure for dynamic trees
- Daniel
- 1981
(Show Context)
Citation Context ...ents the trie as a balanced binary tree, then makes this tree functional via path copying. Specifically, we will use a balanced representation of tries similar to link-cut trees of Sleator and Tarjan =-=[ST83]-=-. This representation is natural because the link and cut operations are essentially subtree copy and delete. 6 Sleator and Tarjan’s original formulation of link-cut trees cannot be directly implement... |

11 |
Purely Functional Data Structures. Cambridge Univ Pr
- Okasaki
- 1999
(Show Context)
Citation Context ...ta structure and the best functional data structure for some problems. On the other hand, many common data structures can be implemented functionally with only a constant-factor overhead; see Okasaki =-=[Oka98]-=-. One example we use frequently is a functional catenable deque, supporting insertion and deletion at either end and concatenation of two deques in constant time per operation [Oka98]. Figure 1: This ... |

10 | Making data structures confluently persistent
- Fiat, Kaplan
- 2001
(Show Context)
Citation Context ...first posed as an open problem in [DSST89]. In 1994, Driscoll et al. [DST94] defined confluence and gave a specific data structure for confluently persistent catenable lists. In 2003, Fiat and Kaplan =-=[FK03]-=- developed the first and only general methods for making a pointer-based data structure confluently persistent, but the slowdown is often suboptimal. In particular, their best deterministic result has... |

10 | AVL dags - Myers - 1982 |

8 | Efficient algorithms for computing geometric intersections - Swart - 1985 |

4 |
Fully persistent arrays (extended abstract
- Dietz
- 1989
(Show Context)
Citation Context ...rts any pointer-based data structure with bounded in-degree into an equivalent fully persistent data structure with only constant-factor overhead in time and space for every operation. In 1989, Dietz =-=[Die89]-=- developed a fully persistent array supporting random access in O(lg lg m) time, where m is the number of updates made to any version. This data structure enables simulation of an arbitrary RAM data s... |

2 |
Leung Maverick Woo. Space-efficient finger search on degree-balanced search trees
- Blelloch, Maggs, et al.
- 2003
(Show Context)
Citation Context ... two pseudofingers that are adjacent in T ′′. A knuckle is a connected component of T after removing all its pseudofingers and tendons. 5 Our hand structure is unrelated to the hand data structure of =-=[BMW03]-=-. 7tendons (forming T ′ ) prosthetic fingers fingers T knuckles hand Figure 5: Fingers, prosthetic fingers, tendons, knuckles, and the hand. At the top level, the functional trie data structure of Th... |

1 |
Dominic Sleator, and Robert Endre Tarjan. Biased search trees
- Bent, Daniel
- 1985
(Show Context)
Citation Context ...low the worst-case logarithmic link-cut trees of [ST83, Section 5]. These link-cut trees decompose the trie into a set of “heavy” paths, and represent each heavy path by a globally biased binary tree =-=[BST85]-=-, tied together into one big tree which we call the representation tree. An edge is heavy if more than half of the descendants of the parent are also descendants of the child. A heavy path is a contig... |

1 | Krijnen and Lambert Meertens. Making B-trees work for B - Timo - 1983 |