## 6.897: Advanced data structures (Spring 2005), Lecture 3, February 8 (2005)

### Cached

### Download Links

- [theory.csail.mit.edu]
- [courses.csail.mit.edu]
- [theory.lcs.mit.edu]
- [theory.csail.mit.edu]
- [courses.csail.mit.edu]
- [theory.lcs.mit.edu]
- [courses.csail.mit.edu]
- [theory.csail.mit.edu]
- [theory.lcs.mit.edu]
- [theory.csail.mit.edu]
- [courses.csail.mit.edu]
- [theory.lcs.mit.edu]
- [theory.csail.mit.edu]
- [courses.csail.mit.edu]
- [ocw.mit.edu]
- [courses.csail.mit.edu]
- [courses.csail.mit.edu]
- [courses.csail.mit.edu]
- [courses.csail.mit.edu]
- [theory.csail.mit.edu]
- [zach.in.tu-clausthal.de]
- [courses.csail.mit.edu]
- [courses.csail.mit.edu]

Citations: | 3 - 0 self |

### BibTeX

@MISC{Demaine056.897:advanced,

author = {Prof Erik Demaine and Scribes Christos Kapoutsis and Loizos Michael},

title = {6.897: Advanced data structures (Spring 2005), Lecture 3, February 8},

year = {2005}

}

### OpenURL

### Abstract

Recall from last lecture that we are looking at the document-retrieval problem. The problem can be stated as follows: Given a set of texts T1, T2,..., Tk and a pattern P, determine the distinct texts in which the patterns occurs. In particular, we are allowed to preprocess the texts in order to be able to answer the query faster. Our preprocessing choice was the use of a single suffix tree, in which all the suffixes of all the texts appear, each suffix ending with a distinct symbol that determines the text in which the suffix appears. In order to answer the query we reduced the problem to range-min queries, which in turn was reduced to the least common ancestor (LCA) problem on the cartesian tree of an array of numbers. The cartesian tree is constructed recursively by setting its root to be the minimum element of the array and recursively constructing its two subtrees using the left and right partitions of the array. The range-min query of an interval [i, j] is then equivalent to finding the LCA of the two nodes of the cartesian tree that correspond to i and j. In this lecture we continue to see how we can solve the LCA problem on any static tree. This will involve a reduction of the LCA problem back to the range-min query problem (!) and then a

### Citations

8786 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1992
(Show Context)
Citation Context ...sh function h is totally random if for all x ∈ U, independent of all y for all y = x ∈ U, Pr h {h(x) = t} = 1 m Totally random hash functions are the same thing as the simple uniform hashing of CLRS =-=[1]-=-. However, with the given defintion, a hash function must take Θ(lg m) to store the hash of one key 1x ∈ U in order for it to be totally random. There are u keys, which mean in total, it requires Θ(u... |

614 |
Data Structures and Network Algorithms
- Tarjan
- 1983
(Show Context)
Citation Context ...e root of its tree, such as the sum or minimum of the weights of each edge. This augmentation is necessary in flow algorithms. 3 Link-Cut Trees Link-Cut Trees were developed by Sleator and Tarjan [1] =-=[2]-=-. They achieve logarithmic amortized cost per operation for all operations. Link-Cut Trees are similar to Tango trees in that they 1use the notions of preferred child and preferred path. representati... |

551 |
The Input/Output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...ost modern computers do not have a flat memory. They have a memory hierarchy consisting of multiple levels of memory with each level getting progressively slower and larger. 4.1 External-Memory Model =-=[AV88]-=- In this model, we only consider a two-level memory hierarchy, on the assumption that the last two levels dominate the total cost. The memory closer to the CPU is called the cache and the level furthe... |

335 |
New hash functions and their use in authentication and set equality
- Wegman, Carter
- 1981
(Show Context)
Citation Context ...up and Zhang has query time as a function of k [4]. Another hash function that takes up O(n ɛ ) space is presnted by Siegel [5]. These hash functions have O(1) query when k = Θ(lg n). Example: Carter =-=[3]-=- is Another example of a k-wise independent hash function presented by Wegman and ∑k−1 h(x) = [( aix i ) mod p] mod m. i=0 In this hash function, the ais satisfy 0 ≤ ai < p and 0 < ak−1 < p. p is stil... |

299 |
A data structure for dynamic trees
- Sleator, Tarjan
- 1983
(Show Context)
Citation Context ...o the root of its tree, such as the sum or minimum of the weights of each edge. This augmentation is necessary in flow algorithms. 3 Link-Cut Trees Link-Cut Trees were developed by Sleator and Tarjan =-=[1]-=- [2]. They achieve logarithmic amortized cost per operation for all operations. Link-Cut Trees are similar to Tango trees in that they 1use the notions of preferred child and preferred path. represen... |

181 |
Scaling and related techniques for geometry problems
- Gabow, Bentley, et al.
- 1984
(Show Context)
Citation Context ...ie, and store the value of the first chunk along each compressed edge. To build this tree in O(n) time, we use an idea similar to the Cartesian tree construction algorithm of Gabow, Bentleyand Tarjan =-=[4]-=-. We do not describe the Cartesian tree here, but reformulate the algorithm in the context of our problem. We build the compressed trie byinserting signatures in sorted order. We insert the first sign... |

155 |
Two algorithms for maintaining order in a list
- Dietz, Sleator
- 1987
(Show Context)
Citation Context ... given (x, y), determines if x precedes y in the list. We will achieve a constant-time updates and queries, by using list labeling with a polynomial tag space and indirection. This solution is due to =-=[5]-=-. On the top level, we will use a maximum label of size n2 , and have lg n elements. We will use the O(lg n) solution at this level. At the second level, we will have a number of data structures of si... |

113 |
Emde Boas. Preserving order in a forest in less than logarithmic time and linear
- van
- 1977
(Show Context)
Citation Context ...rik Demaine Lecture 2 — February 12, 2003 Scribe: Jeff Lindy 1 Overview In the last lecture we considered the successor problem for a bounded universe of size u. We began looking at the van Emde Boas =-=[3]-=- data structure, which implements Insert, Delete, Successor, and Predecessor in O(lg lg u) time per operation. In this lecture we finish up van Emde Boas, and improve the space complexity from our ori... |

105 | Chernoff-hoeffding bounds for applications with limited independence
- Schmidt, Siegel, et al.
- 1993
(Show Context)
Citation Context ...the totally random hash function assumption with lg n either Θ( lg lg n )-wise independent hash functions (which is a lot of required independence!), as found by Schmidt, Siegel and Srinivasan (1995) =-=[11]-=-, or simple tabulation hashing [10]. Thus, the bound serves as the motivation for moving onto perfect hashing, but in the meantime the outlook for basic chaining is not as bad as it first seems. The m... |

85 |
Log-logarithmic worst-case range queries are possible
- Willard
- 1983
(Show Context)
Citation Context ...til we have n ∗ /4, we rebuild for n ∗ instead of 2n ∗ . (If we delete enough, we rebuild at half the size.) All operations take amortized O(1) time. 55 y-fast Trees 5.1 Operation Costs y-fast trees =-=[1, 2]-=- accomplish the same running times as van Emde Boas—O(lg lg u) for Insert, Delete, Successor, and Predecessor—but they are simple once you have dynamic perfect hashing in place. We can define correspo... |

68 | Cache-oblivious priority queue and graph algorithm applications
- Arge, Bender, et al.
- 2002
(Show Context)
Citation Context ...the labels for O(lg n) elements at once. n 4 Cache-oblivious priority queues We will describe cache-oblivious priority queues which achieve O( 1 B lgM/B N B ) amortized memory transfers per operation =-=[6]-=-. Thus, N calls to the priority queue are as efficient as sorting. The main idea is to use lg lg N levels of sizes N, N 2/3 , N 4/9 , . . ., reminescent of exponential trees. Each level has an up buff... |

68 |
Structures and Network Algorithms, Society for
- Tarjan
- 1983
(Show Context)
Citation Context ...e root of its tree, such as the sum or minimum of the weights of each edge. This augmentation is necessary in flow algorithms. 3 Link-Cut Trees Link-Cut Trees were developed by Sleator and Tarjan [1] =-=[2]-=-. They achieve logarithmic amortized cost per operation for all operations. Link-Cut Trees are similar to Tango trees in that they 1use the notions of preferred child and preferred path. The also use... |

63 | Two simplified algorithms for maintaining order in a list
- Bender, Cole, et al.
- 2002
(Show Context)
Citation Context ...viously; inserting and deleting take O( lg2 N B ). The solution we present is due to Itai, Konheim, and Rodeh [1]. Worst-case bounds have also been achieved, originally by [2], and then simplified in =-=[3]-=-. Solution Overview. We divide the array into buckets of size Θ(lg N). The rough idea is that each time we update, we first consider the updated bucket; if it has too many elements or too few, we look... |

62 |
Tabulation based 4-universal hashing with applications to second moment estimation
- Thorup, Zhang
- 2004
(Show Context)
Citation Context ...a prime greater than u. There are other interesting k-wise independent hash functions if we allow O(n ɛ ) space. One such hash function presented by Thorup and Zhang has query time as a function of k =-=[4]-=-. Another hash function that takes up O(n ɛ ) space is presnted by Siegel [5]. These hash functions have O(1) query when k = Θ(lg n). Example: Carter [3] is Another example of a k-wise independent has... |

40 |
The spatial complexity of oblivious k-probe hash functions
- Schmidt, Siegel
- 1990
(Show Context)
Citation Context ...ime with only double the space (a luxury in Knuth’s time, but reasonable now!) In 1990, it was shown that O(lg n)-wise independent hash functions also resulted in constant expected time per operation =-=[7]-=-. The breakthrough result in 2007 was that we in fact only needed 5-independence to get constant expected time, in [8] (updated in 2009). This was a heavily practical paper, emphasizing machine implem... |

40 |
János Komlós, and Endre Szemerédi. Storing a sparse table with 0(1) worst case access time
- Fredman
- 1984
(Show Context)
Citation Context ... ≥ cµ] > 1/n ɛ for some ɛ. So by caching, we can see that the expected chain length bounds of basic chaining is still decent, to some extent. 4 FKS Perfect Hashing – Fredman, Komlós, Szemerédi (1984) =-=[17]-=- Perfect hashing changes the idea of chaining by turning the linked list of collisions into a separate collision-free hash table. FKS hashing is a two-layered hashing solution to the static dictionary... |

33 | Scanning and traversing: Maintaining data for traversals in a memory hierarchy
- Bender, Cole, et al.
(Show Context)
Citation Context ..., the gaps between elements are of O(1) size. This is the best cache-oblivious update bound known if we want a O(⌈K/B⌉) worst-case traversal bound. 33.2 Sacrificing Traversals Slightly Bender et al. =-=[BCDF00]-=- have shown that if we relax the worst-case traversal bound slightly and store the elements out-of-order, we can do updates better. This is somewhat counterintuitive because, by not knowing B, we don’... |

32 |
A density control algorithm for doing insertions and deletions in a sequentially ordered in good worst-case time
- Willard
- 1992
(Show Context)
Citation Context ...y transfers, even cache-obliviously; inserting and deleting take O( lg2 N B ). The solution we present is due to Itai, Konheim, and Rodeh [1]. Worst-case bounds have also been achieved, originally by =-=[2]-=-, and then simplified in [3]. Solution Overview. We divide the array into buckets of size Θ(lg N). The rough idea is that each time we update, we first consider the updated bucket; if it has too many ... |

32 |
Lower bounds for accessing binary search trees with rotations
- Wilber
- 1989
(Show Context)
Citation Context ...s (more tightly fitting xj), or nothing happens (the access is uninteresting for xj). The Wilber number is the number of alternations between ai increasing and bi decreasing. Wilber’s 2nd lower bound =-=[Wil89]-=- states that the sum of the Wilber numbers of all xj’s is a lower bound on the total cost of the access sequence, for any BST. It should be noted that this only holds in an amortized sense (for the to... |

29 |
Integer sorting in O(n √ log log n) expected time and linear space
- Han, Thorup
- 2002
(Show Context)
Citation Context ...rick and Reisch [7]: O(n lg w lg n ). This is o(n lg lg n) forw =lg1+o(1) n. You are asked to prove this result in problem 7. • Han [5]: O(n lg lg n) deterministic and on the AC0 RAM • Han and Thorup =-=[6]-=-: O(n √ √ lg lg n) randomized. Actually, one can achieve O(n lg w improving the result of [7]. lg n ), We will prove the result in [2]. Combining this result with van Emde Boas gives an O(n lg lg n) u... |

26 |
New trie data structures which support very fast search operations
- Willard
- 1984
(Show Context)
Citation Context ...til we have n ∗ /4, we rebuild for n ∗ instead of 2n ∗ . (If we delete enough, we rebuild at half the size.) All operations take amortized O(1) time. 55 y-fast Trees 5.1 Operation Costs y-fast trees =-=[1, 2]-=- accomplish the same running times as van Emde Boas—O(lg lg u) for Insert, Delete, Successor, and Predecessor—but they are simple once you have dynamic perfect hashing in place. We can define correspo... |

26 | On universal classes of extremely random constant-time hash functions
- Siegel
(Show Context)
Citation Context ...unctions if we allow O(n ɛ ) space. One such hash function presented by Thorup and Zhang has query time as a function of k [4]. Another hash function that takes up O(n ɛ ) space is presnted by Siegel =-=[5]-=-. These hash functions have O(1) query when k = Θ(lg n). Example: Carter [3] is Another example of a k-wise independent hash function presented by Wegman and ∑k−1 h(x) = [( aix i ) mod p] mod m. i=0 I... |

17 |
and Flemming Friche Rodler, “Cuckoo hashing
- Pagh
- 2004
(Show Context)
Citation Context ...high probability [10]; the proof is a simple generalization of the argument we gave, except that now we check per batch whether or not something is in a run. 6 Cuckoo Hashing – Pagh and Rodler (2004) =-=[15]-=- Cuckoo hashing is similar to double hashing and perfect hashing. Cuckoo hashing is inspired by the Cuckoo bird, which lays its eggs in other birds’ nests, bumping out the eggs that are originally the... |

16 |
Friedhelm Meyer auf der Heide. A new universal class of hash functions and dynamic hashing in real time
- Dietzfelbinger
- 1990
(Show Context)
Citation Context ...ger, due to the C 2 t size of the second hash table. Thus, we will still have O(1) deterministic query, but additionally we will have O(1) expected update. A result due to Dietzfelbinger and Heide in =-=[14]-=- allows Dynamic FKS to be performed w.h.p. with O(1) expected update. 5 Linear probing Linear probing is perhaps one of the first algorithms for handling hash collisions that you learn when initially ... |

14 | Linear probing with constant independence
- Pagh, Pagh, et al.
- 2007
(Show Context)
Citation Context ... independent hash functions also resulted in constant expected time per operation [7]. The breakthrough result in 2007 was that we in fact only needed 5-independence to get constant expected time, in =-=[8]-=- (updated in 2009). This was a heavily practical paper, emphasizing machine implementation, and it resulted in a large focus on k-independence in the case that k = 5. At this time it was also shown th... |

13 |
M.: The power of simple tabulation hashing
- Patrascu, Thorup
(Show Context)
Citation Context ...sumption with lg n either Θ( lg lg n )-wise independent hash functions (which is a lot of required independence!), as found by Schmidt, Siegel and Srinivasan (1995) [11], or simple tabulation hashing =-=[10]-=-. Thus, the bound serves as the motivation for moving onto perfect hashing, but in the meantime the outlook for basic chaining is not as bad as it first seems. The major problems of accessing a long c... |

9 |
Torben Hagerup, Jyrki Katajainen, and Martti Penttonen. A reliable randomized algorithm for the closest-pair problem
- Dietzfelbinger
- 1997
(Show Context)
Citation Context ...o multiply a by x and then rightshift the resulting word. By doing this, the hash function uses the lg m high bits of a · x. These results come from Dietzfelbinger, Hagerup, Katajainen, and Penttonen =-=[2]-=-. 2.3 k-Wise Independent Definition 5. A family H of hash functions is k-wise independent if for every h ∈ H, and for all distinct x1, x2, . . . , xk ∈ U, Pr{h(x1) = t1& · · · &h(xk) = tk} = O( 1 ). m... |

9 |
Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, and Robert Tarjan. Dynamic perfect hashing: Upper and lower bounds
- Dietzfelbinger, Karlin, et al.
- 1994
(Show Context)
Citation Context ...pace, as we can see from the above construction. Updates, which would make the structure dynamic, are randomized. 4.1 Dynamic FKS – Dietzfelbinger, Karlin, Mehlhorn, Heide, Rohnert, and Tarjan (1994) =-=[13]-=- The translation to dynamic perfect hashing is smooth and obvious. To insert a key is essentially two-level hashing, unless we get a collision in the Ct hash table, in which case we need to rebuilt th... |

8 | Key independent optimality
- Iacono
(Show Context)
Citation Context ...ined as usual with respect to the optimal offline BST. Note that the offline optimum knows the bijection (alternatively, we take the optimum for each bijection). Theorem 1 (key independent optimality =-=[Iac02]-=-). A BST has the key-independent optimality property iff it has the working-set property. In particular, splay trees are key-independently optimal. Proof. (sketch) Take a uniformly random bijection b.... |

8 | The geometry of binary search tree
- Demaine, Harmon, et al.
- 2009
(Show Context)
Citation Context ...so far is the O(log log(n)) competitive ratio achieved by the Tango Trees - we shall see them in the later part of the lecture. Another perspective, is the recently proposed geometric view of the BST =-=[DHIKP09]-=-. In this approach, an correspondence between the BST model of computation and points in R 2 is given. Informally, call a set P of points arborally satisfied if, for any two points a, b ∈ P not on a c... |

6 |
and Torben Hagerup. Improved parallel integer sorting without concurrent writing
- Albers
- 1997
(Show Context)
Citation Context ... the children of each node bychunk value in O(n) time. An inorder traversal of the resulting tree will give us the ordering of the leaves. w 3 Packed Sorting Packed sorting, due to Albers and Hagerup =-=[1]-=-, can sort n integers of b bits in O(n) time,given awordsizeofw ≥ 2(b +1)lgn lg lg n. We can therefore pack lg n lg lg n elements into one word in memory. We leave one zero bit between each integer, a... |

6 |
Upper Bounds for Sorting
- Kirkpatrick, Reisch
- 1984
(Show Context)
Citation Context ...sisO(n lg lg n). • Andersson, Hagerup, Nilsson, and Raman [2]: O(n) forw =Ω(lg 2+ε n). Combined with the previous result for small w, this gives sorting in O(n lg lg n) time. • Kirkpatrick and Reisch =-=[7]-=-: O(n lg w lg n ). This is o(n lg lg n) forw =lg1+o(1) n. You are asked to prove this result in problem 7. • Han [5]: O(n lg lg n) deterministic and on the AC0 RAM • Han and Thorup [6]: O(n √ √ lg lg ... |

6 |
Mihai Pǎtra¸scu. Dynamic optimality — almost
- Demaine, Harmon, et al.
(Show Context)
Citation Context ...the total cost. At the end, there are at most one marble left in any node, so at most n marbles in total. Then, the cost is at least the total number of interleaves minus n. 3 Tango Trees Tango trees =-=[DHIP04]-=- are an O(lg lg n)-competitive BST. They represent an important step forward from the previous competitive ratio of O(lg n), which is achieved by standard balanced trees. The running time of Tango tre... |

5 |
Deterministic Sorting in O(n log log n
- Han
- 2004
(Show Context)
Citation Context ...sult for small w, this gives sorting in O(n lg lg n) time. • Kirkpatrick and Reisch [7]: O(n lg w lg n ). This is o(n lg lg n) forw =lg1+o(1) n. You are asked to prove this result in problem 7. • Han =-=[5]-=-: O(n lg lg n) deterministic and on the AC0 RAM • Han and Thorup [6]: O(n √ √ lg lg n) randomized. Actually, one can achieve O(n lg w improving the result of [7]. lg n ), We will prove the result in [... |

5 |
Tarjan, Dynamic perfect hashing: Upper and lower bounds
- Dietzfelbinger, Karlin, et al.
(Show Context)
Citation Context ...ghtforward, and can be done by keeping track of how many elements we have inserted or deleted, and rebuilding (shrinking or growing) the entire hash table. This result is due to Dietzfelbinger et al. =-=[4]-=-. Suppose at time t there are n ∗ elements. Construct a static perfect hash table for 2n ∗ elements. If we insert another n ∗ elements, we rebuild again, this time a static perfect hash table for 4n ∗... |

5 | On the k-independence required by linear probing and minwise independence
- Pǎtraşcu, Thorup
- 2010
(Show Context)
Citation Context ...ase that k = 5. At this time it was also shown that 2-independent hash functions could only achieve a really bad lower bound of Ω(lg n) expected time per operation; this bound was improved in 2010 by =-=[9]-=- showing that there existed some 4-independent hash functions that also had Ω(lg n) expected time (thus making the 5-independence bound tight!) The most recent result is [10] showing that simple tabul... |

3 |
Bounds on the independence required for cuckoo hashing
- Cohen, Kane
- 2009
(Show Context)
Citation Context ...96-wise independence is insufficient for Cuckoo hashing to get O(1) expected update, with a build failure probability of 1 − 1/n, which is quite bad. This result is shown by Cohen and Kane (2009) in =-=[16]-=-. With simple tabulation hashing, the build failure probability becomes Θ(1/n ( 1/3)), which can be found in [10]. Theorem 15 (Constant expected update, for totally random hash functions). Pr[Insert f... |

1 |
A Sparse Table Implementation
- Itai, Konheim, et al.
- 1981
(Show Context)
Citation Context ...ounded, scanning k consecutive elements takes O(1+ k B ) memory transfers, even cache-obliviously; inserting and deleting take O( lg2 N B ). The solution we present is due to Itai, Konheim, and Rodeh =-=[1]-=-. Worst-case bounds have also been achieved, originally by [2], and then simplified in [3]. Solution Overview. We divide the array into buckets of size Θ(lg N). The rough idea is that each time we upd... |