Results 1  10
of
16
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Computational bounds on hierarchical data processing with applications to information security
 In Proc. Int. Colloquium on Automata, Languages and Programming (ICALP), volume 3580 of LNCS
, 2005
"... Motivated by the study of algorithmic problems in the domain of information security, in this paper, we study the complexity of a new class of computations over a collection of values associated with a set of n elements. We introduce hierarchical data processing (HDP) problems which involve the comp ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
Motivated by the study of algorithmic problems in the domain of information security, in this paper, we study the complexity of a new class of computations over a collection of values associated with a set of n elements. We introduce hierarchical data processing (HDP) problems which involve the computation of a collection of output values from an input set of n elements, where the entire computation is fully described by a directed acyclic graph (DAG). That is, individual computations are performed and intermediate values are processed according to the hierarchy induced by the DAG. We present an Ω(log n) lower bound on various computational cost measures for HDP problems. Essential in our study is an analogy that we draw between the complexities of any HDP problem of size n and searching by comparison in an order set of n elements, which shows an interesting connection between the two problems. In view of the logarithmic lower bounds, we also develop a new randomized DAG scheme for HDP problems that provides close to optimal performance and achieves cost measures with constant factors of the (logarithmic) leading asymptotic term that are close to optimal. Our lower bounds are general, apply to all HDP problems and, along with our new DAG construction, they provide an interesting –as well as useful in the area of algorithm analysis – theoretical framework. We apply our results to two information security problems, data authentication through cryptographic hashing and multicast key distribution using keygraphs and get a unified analysis and treatment for these problems. We show that both problems involve HDP and prove logarithmic lower bounds on their computational and communication costs. In particular, using our new DAG scheme, we present a new efficient authenticated dictionary with improved authentication overhead over previously known schemes. Moreover, through the relation between HDP and searching by comparison, we present a new skiplist version where the expected number of comparisons in a search is 1.25log 2 n + O(1). 1
SelfAdjusting Trees in Practice for Large Text Collections
 Software  Practice and Experience
, 2002
"... Splay and randomised search trees are selfbalancing binary tree structures with little or no space overhead compared to a standard binary search tree. Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collecti ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Splay and randomised search trees are selfbalancing binary tree structures with little or no space overhead compared to a standard binary search tree. Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and randomised search trees are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganisation costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bitwise hashing and redblack trees for comparison.
Randomized KDimensional Binary Search Trees
, 1998
"... This paper introduces randomized Kdimensional binary search trees (randomized Kdtrees), a variant of Kdimensional binary trees. This data structure allows the efficient maintenance of multidimensional records for any sequence of insertions and deletions; and thus, is fully dynamic. We show that ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
This paper introduces randomized Kdimensional binary search trees (randomized Kdtrees), a variant of Kdimensional binary trees. This data structure allows the efficient maintenance of multidimensional records for any sequence of insertions and deletions; and thus, is fully dynamic. We show that several types of associative queries are efficiently supported by randomized Kdtrees. In particular, a randomized Kdtree with n records answers exact match queries in expected O(log n) time. Partial match queries are answered in expected O(n 1\Gammaf (s=K) ) time, when s out of K attributes are specified, with 0 ! f(s=K) ! 1 a real valued function of s=K). Nearest neighbor queries are answered online in expected O(log n) time. Our randomized algorithms guarantee that their expected time bounds hold irrespective of the order and number of insertions and deletions. Keywords: Randomized Algorithms, Multidimensional Data Structures, Kdtrees, Associative Queries, Multidimensional Diction...
Martingales and large deviations for binary search trees
 Random Struct. Algorithms
, 2001
"... We establish an almost sure large deviations theorem for the depth of the external nodes of binary search trees (BST). To achieve this, a parametric family of martingales is introduced. This family also allows us to get asymptotic results on the number of external nodes at deepest level. ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We establish an almost sure large deviations theorem for the depth of the external nodes of binary search trees (BST). To achieve this, a parametric family of martingales is introduced. This family also allows us to get asymptotic results on the number of external nodes at deepest level.
Exploring the Duality Between Skip Lists and Binary Search Trees
, 2007
"... Although skip lists were introduced as an alternative to balanced binary search trees (BSTs), we show that the skip list can be interpreted as a type of randomlybalanced BST whose simplicity and elegance is arguably on par with that of today’s most popular BST balancing mechanisms. In this paper, w ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Although skip lists were introduced as an alternative to balanced binary search trees (BSTs), we show that the skip list can be interpreted as a type of randomlybalanced BST whose simplicity and elegance is arguably on par with that of today’s most popular BST balancing mechanisms. In this paper, we provide a clear, concise description and analysis of the “BST ” interpretation of the skip list, and compare it to similar randomized BST balancing mechanisms. In addition, we show that any rotationbased BST balancing mechanism can be implemented in a simple fashion using a skip list.
Optimal spacetime dictionaries over an unbounded universe with flat implicit trees
, 2003
"... In the classical dictionary problem, a set of n distinct keys over an unbounded and ordered universe is maintained under insertions and deletions of individual keys while supporting search operations. An implicit dictionary has the additional constraint of occupying the space merely required by stor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In the classical dictionary problem, a set of n distinct keys over an unbounded and ordered universe is maintained under insertions and deletions of individual keys while supporting search operations. An implicit dictionary has the additional constraint of occupying the space merely required by storing the n keys, that is, exactly n contiguous words of space in total. All what is known is the starting position of the memory segment hosting the keys, as the rest of the information is implicitly encoded by a suitable permutation of the keys. This paper describes the
at implicit tree, which is the rst implicit dictionary requiring O(log n) time per search and update operation.
Redesigning the String Hash Table, Burst Trie, and BST to Exploit Cache
, 2011
"... A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and vari ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and variants of binary search tree. These data structures are computationally efficient, but typical implementations use large numbers of nodes and pointers to manage strings, which is not efficient in use of cache. In this article, we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. For hashing, in the best case the total space overhead is reduced to less than 1 bit per string. For the burst trie, over 300MB of strings can be stored in a total of under 200MB of memory with significantly improved search time. These results, on a variety of data sets, show that cachefriendly variants of fundamental data structures can yield remarkable gains in performance.
Additive functionals on random search trees
 Department of Mathematical Sciences, The Johns Hopkins University
, 2003
"... Search trees are fundamental data structures in computer science. We study functionals on random search trees that satisfy recurrence relations of a simple additive form. Many important functionals including the space requirement, internal path length, and the socalled shape functional fall under t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Search trees are fundamental data structures in computer science. We study functionals on random search trees that satisfy recurrence relations of a simple additive form. Many important functionals including the space requirement, internal path length, and the socalled shape functional fall under this framework. Our goal is to derive asymptotics of moments and identify limiting distributions of these functionals under two commonly studied probability models—the random permutation model and the uniform model. For the random permutation model, our approach is based on establishing transfer theorems that link the order of growth of the input into a particular (deterministic) recurrence to the order of growth of the output. For the uniform model, our approach is based on the complexanalytic tool of singularity analysis. To facilitate a systematic analysis of these additive functionals we extend singularity analysis, a class of methods by which one can translate on a termbyterm basis an asymptotic expansion of a functional around its dominant singularity into a corresponding expansion for the Taylor coefficients of the function. The most important extension is the determination of how singularities are composed under the operation of Hadamard product of analytic power series. The transfer theorems derived are used in conjunction with the method of moments to establish limit laws for mary search trees under the random permutation model. For the uniform model on binary search trees, the extended singularity analysis toolkit is employed to establish the asymptotic behavior of the moments of a wide class of functionals. These asymptotics are used, again in conjunction with the method of moments, to derive limit laws.
Digital Access to ComparisonBased Tree Data
 Department of Mathematical Sciences, The Johns Hopkins University
"... This paper presents a simple method to build tree data structures which achieve just O(log N) visited nodes and O(D) compared digits (bits or bytes) per search or update, where N is the number of keys and D is the length of the keys, irrespectively of the order of the updates and of the digital ..."
Abstract
 Add to MetaCart
This paper presents a simple method to build tree data structures which achieve just O(log N) visited nodes and O(D) compared digits (bits or bytes) per search or update, where N is the number of keys and D is the length of the keys, irrespectively of the order of the updates and of the digital representation of the keys. The additional space required by the method is asymptotically dismissable compared to the space of keys and pointers, and is easily updated on line. The method applies to fixedlength base2 keys and to variablelength string keys as well, and permits to save space for common prefixes. The same ideas can be applied to the sorting problem, achieving algorithms with the best properties of quicksort/mergesort and radixsort together.