Results 1 -
8 of
8
Making Data Structures Persistent
, 1989
"... This paper is a study of persistence in data structures. Ordinary data structures are ephemeral in the sense that a change to the structure destroys the old version, leaving only the new version available for use. In contrast, a persistent structure allows access to any version, old or new, at any t ..."
Abstract
-
Cited by 231 (6 self)
- Add to MetaCart
This paper is a study of persistence in data structures. Ordinary data structures are ephemeral in the sense that a change to the structure destroys the old version, leaving only the new version available for use. In contrast, a persistent structure allows access to any version, old or new, at any time. We develop simple, systematic, and effiient techniques for making linked data structures persistent. We use our techniques to devise persistent forms of binary search trees with logarithmic access, insertion, and deletion times and O(1) space bounds for insertion and deletion.
Purely Functional Representations of Catenable Sorted Lists.
- In Proceedings of the 28th Annual ACM Symposium on Theory of Computing
, 1996
"... The power of purely functional programming in the construction of data structures has received much attention, not only because functional languages have many desirable properties, but because structures built purely functionally are automatically fully persistent: any and all versions of a structur ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
The power of purely functional programming in the construction of data structures has received much attention, not only because functional languages have many desirable properties, but because structures built purely functionally are automatically fully persistent: any and all versions of a structure can coexist indefinitely. Recent results illustrate the surprising power of pure functionality. One such result was the development of a representation of double-ended queues with catenation that supports all operations, including catenation, in worst-case constant time [19].
Confluently Persistent Deques via Data-Structural Bootstrapping
- J. of Algorithms
, 1993
"... We introduce data-structural bootstrapping, a technique to design data structures recursively, and use it to design confluently persistent deques. Our data structure requires O(log 3 k) worstcase time and space per deletion, where k is the total number of deque operations, and constant worst-case t ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
We introduce data-structural bootstrapping, a technique to design data structures recursively, and use it to design confluently persistent deques. Our data structure requires O(log 3 k) worstcase time and space per deletion, where k is the total number of deque operations, and constant worst-case time and space for other operations. Further, the data structure allows a purely functional implementation, with no side effects. This improves a previous result of Driscoll, Sleator, and Tarjan. 1 An extended abstract of this paper was presented at the 4th ACM-SIAM Symposium on Discrete Algorithms, 1993. 2 Supported by a Fannie and John Hertz Foundation fellowship, National Science Foundation Grant No. CCR-8920505, and the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) under NSF-STC88-09648. 3 Also affiliated with NEC Research Institute, 4 Independence Way, Princeton, NJ 08540. Research at Princeton University partially supported by the National Science Foundatio...
Purely Functional, Real-Time Deques with Catenation
- Journal of the ACM
, 1999
"... We describe an efficient, purely functional implementation of deques with catenation. In addition to being an intriguing problem in its own right, finding a purely functional implementation of catenable deques is required to add certain sophisticated programming constructs to functional programming ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We describe an efficient, purely functional implementation of deques with catenation. In addition to being an intriguing problem in its own right, finding a purely functional implementation of catenable deques is required to add certain sophisticated programming constructs to functional programming languages. Our solution has a worst-case running time of O(1) for each push, pop, inject, eject and catenation. The best previously known solution has an O(log k) time bound for the k deque operation. Our solution is not only faster but simpler. A key idea used in our result is an algorithmic technique related to the redundant digital representations used to avoid carry propagation in binary counting.
Persistent data structures
- IN HANDBOOK ON DATA STRUCTURES AND APPLICATIONS, CRC PRESS 2001, DINESH MEHTA AND SARTAJ SAHNI (EDITORS) BOROUJERDI, A., AND MORET, B.M.E., "PERSISTENCY IN COMPUTATIONAL GEOMETRY," PROC. 7TH CANADIAN CONF. COMP. GEOMETRY, QUEBEC
, 1995
"... ..."
Storage and Retrieval of Highly Repetitive Sequence Collections ∗
"... A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal.
Storage and Retrieval of Individual Genomes
"... Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N/n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.
Storage and Retrieval of Individual Genomes (Extended Abstract)
"... A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be express ..."
Abstract
- Add to MetaCart
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold for example on suffix tree of Human Genome. However, this reduction remains a constant factor when more sequences are added to the collection We develop a new self-index suited for the repetitive sequence collection setting. Its expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction is no longer constant, but depends on N/n. We believe the structure developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies. 1

