Results 1  10
of
20
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
On the Distribution for the Duration of a Randomized Leader Election Algorithm
 Ann. Appl. Probab
, 1996
"... We investigate the duration of an elimination process for identifying a winner by coin tossing, or, equivalently, the height of a random incomplete trie. Applications of the process include the election of a leader in a computer network. Using direct probabilistic arguments we obtain exact expressio ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
We investigate the duration of an elimination process for identifying a winner by coin tossing, or, equivalently, the height of a random incomplete trie. Applications of the process include the election of a leader in a computer network. Using direct probabilistic arguments we obtain exact expressions for the discrete distribution and the moments of the height. Elementary approximation techniques then yield asymptotics for the distribution. We show that no limiting distribution exists, as the asymptotic expressions exhibit periodic fluctuations. In many similar problems associated with digital trees, no such exact expressions can be derived. We therefore outline a powerful general approach, based on the analytic techniques of Mellin transforms, Poissonization, and dePoissonization, from which distributional asymptotics for the height can also be derived. In fact, it was this complex variables approach that led to our original discovery of the exact distribution. Complex analysis metho...
Analysis of an Asymmetric Leader Election Algorithm
 Electronic J. Combin
, 1996
"... We consider a leader election algorithm in which a set of distributed objects (people, computers, etc.) try to identify one object as their leader. The election process is randomized, that is, at every stage of the algorithm those objects that survived so far flip a biased coin, and those who rec ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
We consider a leader election algorithm in which a set of distributed objects (people, computers, etc.) try to identify one object as their leader. The election process is randomized, that is, at every stage of the algorithm those objects that survived so far flip a biased coin, and those who received, say a tail, survive for the next round. The process continues until only one objects remains. Our interest is in evaluating the limiting distribution and the first two moments of the number of rounds needed to select a leader. We establish precise asymptotics for the first two moments, and show that the asymptotic expression for the duration of the algorithm exhibits some periodic fluctuations and consequently no limiting distribution exists. These results are proved by analytical techniques of the precise analysis of algorithms such as: analytical poissonization and depoissonization, Mellin transform, and complex analysis.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
Laws of large numbers and tail inequalities for random tries and Patricia trees
 Journal of Computational and Applied Mathematics
, 2002
"... Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height, depth, size, and profile of these trees and ordinary tries that apply without any conditions on the string distributions—they need not even be identically distributed.
Asymptotic Behavior Of The Height In A Digital Search Tree And The Longest Phrase Of The LempelZiv Scheme
 SIAM J. Computing
, 2000
"... . We study the height of a digital search tree (DST) built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We shall argue that the height of such a tree is equivalent to the length of the longest phrase in the LempelZiv parsing scheme that p ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
. We study the height of a digital search tree (DST) built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We shall argue that the height of such a tree is equivalent to the length of the longest phrase in the LempelZiv parsing scheme that partitions a random sequence into n phrases. We also analyze the longest phrase in the LempelZiv scheme in which a string of fixed length m is parsed into a random number of phrases. In the course of our analysis, we shall identify four natural regions of the height distribution and characterize them asymptotically for large n. In particular, for the region where most of the probability mass is concentrated, the asymptotic distribution of the height exhibits an exponential of a Gaussian distribution (with an oscillating term) around the most probable value k 1 = #log 2 n+ # 2 log 2 n  log 2 ( # 2 log 2 n) + 1 log 2  1 2 # + 1. More precisely, we shall prove that the asymptoti...
Size and Path length of Patricia Tries: Dynamical Sources Context.
, 2001
"... Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression, and so forth. We consider here tries and Patricia tries built from $n$ words emitted by a probabilistic dynamical source. Such sources encompass classical and many more models of sources as memoryless sources and finite Markov chains. The probabilistic behavior of the main parameters, namely the size and path length, appears to be determined by some intrinsic characteristics of the source, namely the entropy and two other constants, themselves related in a natural way to spectral properties of specific transfer operators of Ruelle type. Keywords: Averagecase Analysis of datastructures, Information Theory, Trie, Mellin analysis, Dynamical systems, Ruelle operator, Functional Analysis.
Universal Asymptotics for Random Tries and Patricia Trees
 Algorithmica
, 2004
"... We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. We show that many parameters Z_n of these random structures are universally stable in the sense that Z_n/E{Z_n} tends to one probability. This occur ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. We show that many parameters Z_n of these random structures are universally stable in the sense that Z_n/E{Z_n} tends to one probability. This occurs, for example, when Z_n is the height, the size, the depth of the last node added, the number of nodes at a given depth (also called the profile), the search time for a partial match, the stack size, or the number of nodes with k children. These properties are valid without any conditions on the string distributions.
Limit Laws for Heights in Generalized Tries and PATRICIA Tries
, 1999
"... We consider digital trees such as (generalized) tries and PATRICIA tries, built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We study limit laws of the height which is defined as the longest path in such trees. It turns out that this heigh ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We consider digital trees such as (generalized) tries and PATRICIA tries, built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We study limit laws of the height which is defined as the longest path in such trees. It turns out that this height also represents the number of random questions required to recognize n distinct objects. We shall identify three natural regions of the height distributions. For tries, in the region where most of the probability mass is concentrated, the asymptotic distribution is of extreme value type (i.e., double exponential distribution). Surprisingly enough, the height of the PATRICIA trie behaves quite differently in this region: It exhibits an exponential of a Gaussian distribution (with an oscillating term) around the most probable value k 1 = blog 2 n + p 2 log 2 n \Gamma 3 2 c+1. In fact, the asymptotic distribution of PATRICIA height concentrates on one or two points. For most n all the mass...
Asymptotic variance of random symmetric digital search trees
, 2009
"... Asymptotics of the variances of many cost measures in random digital search trees are often notoriously messy and involved to obtain. A new approach is proposed to facilitate such an analysis for several shape parameters on random symmetric digital search trees. Our approach starts from a more caref ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Asymptotics of the variances of many cost measures in random digital search trees are often notoriously messy and involved to obtain. A new approach is proposed to facilitate such an analysis for several shape parameters on random symmetric digital search trees. Our approach starts from a more careful normalization at the level of Poisson generating functions, which then provides an asymptotically equivalent approximation to the variance in question. Several new ingredients are also introduced such as a combined use of the Laplace and Mellin transforms and a simple, mechanical technique for justifying the analytic dePoissonization procedures involved. The methodology we develop can be easily adapted to many other problems with an underlying binomial distribution. In particular, the less expected and somewhat surprising n(log n) 2variance for certain notions of total pathlength is also clarified.