## Self-adjusting Data Structures for External Memory String Access (2001)

Citations: | 1 - 0 self |

### BibTeX

@TECHREPORT{Ciriani01self-adjustingdata,

author = {V. Ciriani and V. Ciriani and P. Ferragina and P. Ferragina and F. Luccio and F. Luccio and S. Muthukrishnan and S. Muthukrishnan},

title = {Self-adjusting Data Structures for External Memory String Access},

institution = {},

year = {2001}

}

### OpenURL

### Abstract

Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string databases. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence of operations rather than individual string operations. Furthermore, we study this problem in the external memory model where string data is too massive to be stored in main memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed. We show that given n strings S1, ..., Sn of total length P i jS i j = N , a sequence of m string searches S i 1 , S i 2 , ..., S i m takes O( P m j=1 ( jS i j j B ) + P n i=1 (n i log B m n i )) amortized expected I/Os, where n i is the number of times S i is queried. Inserting or deleting a string S takes O( jSj B + log B n) amortized expected I/Os. This result is the analog of what is known as the Static Optimality Theorem [17] proved by Sleator and Tarjan in their classic splay trees paper for a dictionary of numerical values; here, it has been generalized, for the first time to string data, to string operations, and to the external memory model. This performance is achieved not by traditional "splay" operations on search trees as in [17], but by designing a novel self-adjusting data structure based on the well-known skip lists. In addition, we introduce the paradigm of using the main memory (or a part thereof) persistently across operations, in the manner of a cache, to further improve the performance of our self-adjusting skip list. This is quite reasonable in a...

### Citations

644 | Suffix arrays: a new method for on-line string searches - Manber, Myers - 1990 |

371 |
Self-adjusting binary search trees
- Sleator, Tarjan
- 1985
(Show Context)
Citation Context ... is the number of times S i is queried. Inserting or deleting a string S takes O( jSj B + log B n) amortized expected I/Os. This result is the analog of what is known as the Static Optimality Theorem =-=[17]-=- proved by Sleator and Tarjan in their classic splay trees paper for a dictionary of numerical values; here, it has been generalized, for thesrst time to string data, to string operations, and to the ... |

322 | lists: A probabilistic alternative to balanced trees
- Pugh
- 1989
(Show Context)
Citation Context ...lization above. We instead design an alternative data structure for our problem. We start with the well-known Skip List proposed for solving the dictionary problem with numerical value in main memory =-=[15]-=-. Skip lists maintain the items of the set in levels, each level consisting of a randomly chosen half of the items from the previous level. It is easy to extend it to work in external memory by choosi... |

320 | External Memory Algorithms and Data Structures: Dealing with Massive Data - Vitter - 1981 |

286 |
Computational Geometry: An Introduction Through Randomized Algorithms
- Mulmuley
- 1994
(Show Context)
Citation Context ...up items of S. Self-adjusting skip list should \articially" promote the frequently accessed items to the highest levels of L thus facilitating their retrieval. This has been achieved in various w=-=ays [6, 14, 13-=-], but either assuming that the frequency of accesses are known or under specic assumptions such as probability of accesses being non-increasing over time. A recent proposal in this direction has been... |

236 | Optimal Prefetching via Data Compression
- Vitter, Krishnan
- 1991
(Show Context)
Citation Context ...is is not dicult since we can index all the suxes of the strings (a la String B-tree), but the caching strategy is unclear. Some open problems remain concerning the use of main memory as a cache. In [=-=19]-=- optimal prefetching algorithms have been presented and analyzed, deriving them from compression algorithms. The analysis or algorithms there might probably be extended to achieve tighter I/O-bounds. ... |

154 | Fractional cascading: I. A data structuring technique
- Chazelle, Guibas
- 1986
(Show Context)
Citation Context ...ata structure, and (2) the deterministic part of every item's column acts as an anchor between adjacent bands thus resulting useful to make incremental searches among them (a la fractional cascading [=-=5, 12]-=-). More precisely, we have. The number of items which are born at a band B i is forced to be B i = 2 2 i 1 (i.e. it grows doubly exponentially in the band-number). In the lowest band B b(n) we eventua... |

149 |
Information Theory and Coding
- Abramson
- 1963
(Show Context)
Citation Context ... search tree since the access cost above matches the information-theoretic lower bound for the total access time of anysxed search tree including one that knows the sequence of accesses ahead of time =-=[1, 17]-=-; the term P n i=1 q(i) log(m=q(i)) relates to the entropy of the sequence of accesses. Proving a similar theorem of optimality against dynamic search trees (the so called Dynamic Optimality Theorem) ... |

133 | Concurrent cache-oblivious B-trees
- Bender, Fineman, et al.
- 2005
(Show Context)
Citation Context ...een recent focus on designing \cache-oblivious" algorithms, that is, ones 8 which work without knowledge of the parameters of the memory hierarchy (such as B). Extending the cacheoblivioussB-tree=-=s of [2]-=-, we can obtain cache-oblivious algorithms for the string dictionary problem we have studied here. The details are not straightforward and leave its discussion for later. Finally, there are other form... |

120 | The string B-tree: a new data structure for string search in external memory and its applications
- Ferragina, Grossi
- 1999
(Show Context)
Citation Context ...generalized to hold for external-memory string access. In contrast to this bound, using string data structures like String B-trees that give the best known performance per operation in the worst case =-=[7]-=-, one would have achieved an I/O-bound of O( P m j=1 ( jS i j j B ) +m log B n), which is not optimal with respect to the distribution of the strings in the query sequence. 2. We use the paradigm that... |

24 |
Dynamic fractional cascading. Algorithmica 5(2):215–241
- Mehlhorn, Näher
- 1990
(Show Context)
Citation Context ...ata structure, and (2) the deterministic part of every item's column acts as an anchor between adjacent bands thus resulting useful to make incremental searches among them (a la fractional cascading [=-=5, 12]-=-). More precisely, we have. The number of items which are born at a band B i is forced to be B i = 2 2 i 1 (i.e. it grows doubly exponentially in the band-number). In the lowest band B b(n) we eventua... |

19 | Static optimality and dynamic search-optimality in lists and trees
- Blum, Chawla, et al.
(Show Context)
Citation Context ...The details are not straightforward and leave its discussion for later. Finally, there are other forms of self-adjusting results that will be of interest: working set theorem [17], dynamic optimality =-=[3, 17]-=-, and reference locality [6]. SASL allows us to also answer range queries on pairs of strings (S; S 0 ): retrieve all the dictionary words which lexicographically lie between S and S 0 . The algorithm... |

15 | Topology B-trees and their applications
- Callahan, Goodrich, et al.
- 1995
(Show Context)
Citation Context .... The recomputation of these (r) MTF-ranks is done implicitly by means of an additional MTF-list. Consider generalizing the BSL for external memory operations. Say we adopt the bucketing strategy of [=-=4-=-] to make BSL work in external memory. BSL now consists of O(log B n) levels and seems to take (log B r) expected amortized I/Os per operation. However this is not true because we need to update BSL 3... |

11 | Algorithm design and software libraries: Recent developments in the LEDA project
- Mehlhorn, Näher
- 1992
(Show Context)
Citation Context ...up items of S. Self-adjusting skip list should \articially" promote the frequently accessed items to the highest levels of L thus facilitating their retrieval. This has been achieved in various w=-=ays [6, 14, 13-=-], but either assuming that the frequency of accesses are known or under specic assumptions such as probability of accesses being non-increasing over time. A recent proposal in this direction has been... |

9 | Self-adjusting k-ary search trees - Sherk - 1989 |

6 | Self-adjusting multi-way search trees - Martel - 1991 |

4 |
Data Structures and Algorithms: 1. Searching and Sorting
- Mehlhorn
- 1984
(Show Context)
Citation Context ...e to compare the searched strings against the dictionary strings; the P n i=1 n i log B m n i term is related to the entropy of the query sequence, and is a standard information-theoretic lower bound =-=[11]-=-. The B-way branching provided by the page size of a disk access is fully utilized, that is, log 2 and jS i j are replaced by log B and jS i j=B respectively. The total bound is analogous to that in S... |

3 |
Biased dictionaries with fast insert/deletes
- Ergun, Sahinalp, et al.
- 2001
(Show Context)
Citation Context ...up items of S. Self-adjusting skip list should \articially" promote the frequently accessed items to the highest levels of L thus facilitating their retrieval. This has been achieved in various w=-=ays [6, 14, 13-=-], but either assuming that the frequency of accesses are known or under specic assumptions such as probability of accesses being non-increasing over time. A recent proposal in this direction has been... |

3 | Efficient techniques for maintaining multidimensional keys in linked data structures (extended abstract - Grossi, Italiano - 1999 |