Results 1  10
of
57
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
(Show Context)
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
A probabilistic analysis of some tree algorithms
 ANNALS OF APPLIED PROBABILITY
, 2005
"... In this paper a general class of tree algorithms is analyzed. It is shown that, by using an appropriate probabilistic representation of the quantities of interest, the asymptotic behavior of these algorithms can be obtained quite easily without resorting to the usual complex analysis techniques. Thi ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
In this paper a general class of tree algorithms is analyzed. It is shown that, by using an appropriate probabilistic representation of the quantities of interest, the asymptotic behavior of these algorithms can be obtained quite easily without resorting to the usual complex analysis techniques. This approach gives a unified probabilistic treatment of these questions. It simplifies and extends some of the results known in this domain.
Laws of large numbers and tail inequalities for random tries and Patricia trees
 Journal of Computational and Applied Mathematics
, 2002
"... Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height, depth, size, and profile of these trees and ordinary tries that apply without any conditions on the string distributions—they need not even be identically distributed.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
The Complete Analysis of a Polynomial Factorization Algorithm Over Finite Fields
, 2001
"... This paper derives basic probabilistic properties of random polynomials over finite fields that are of interest in the study of polynomial factorization algorithms. We show that the main characteristics of random polynomial can be treated systematically by methods of "analytic combinatorics&quo ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
This paper derives basic probabilistic properties of random polynomials over finite fields that are of interest in the study of polynomial factorization algorithms. We show that the main characteristics of random polynomial can be treated systematically by methods of "analytic combinatorics" based on the combined use of generating functions and of singularity analysis. Our object of study is the classical factorization chain which is described in Fig. 1 and which, despite its simplicity, does not appear to have been totally analysed so far. In this paper, we provide a complete averagecase analysis.
The NDTree: A Dynamic Indexing Technique for Multidimensional Nonordered Discrete Data Spaces
 In Proc. of VLDB
, 2003
"... Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases. ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases.
Continued Fractions, Comparison Algorithms, and Fine Structure Constants
, 2000
"... There are known algorithms based on continued fractions for comparing fractions and for determining the sign of 2x2 determinants. The analysis of such extremely simple algorithms leads to an incursion into a surprising variety of domains. We take the reader through a light tour of dynamical systems ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
There are known algorithms based on continued fractions for comparing fractions and for determining the sign of 2x2 determinants. The analysis of such extremely simple algorithms leads to an incursion into a surprising variety of domains. We take the reader through a light tour of dynamical systems (symbolic dynamics), number theory (continued fractions), special functions (multiple zeta values), functional analysis (transfer operators), numerical analysis (series acceleration), and complex analysis (the Riemann hypothesis). These domains all eventually contribute to a detailed characterization of the complexity of comparison and sorting algorithms, either on average or in probability.
Distributional convergence for the number of symbol comparisons used by QuickSort
, 2012
"... Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic sourc ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild “tameness ” condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, i.e., whenever each key is generated as an infinite string of iid symbols. This is somewhat surprising: Even for the classical model that each key is an iid string of unbiased (“fair”) bits, the mean exhibits periodic fluctuations of order n.
Size and Path length of Patricia Tries: Dynamical Sources Context.
, 2001
"... Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression, and so forth. We consider here tries and Patricia tries built from $n$ words emitted by a probabilistic dynamical source. Such sources encompass classical and many more models of sources as memoryless sources and finite Markov chains. The probabilistic behavior of the main parameters, namely the size and path length, appears to be determined by some intrinsic characteristics of the source, namely the entropy and two other constants, themselves related in a natural way to spectral properties of specific transfer operators of Ruelle type. Keywords: Averagecase Analysis of datastructures, Information Theory, Trie, Mellin analysis, Dynamical systems, Ruelle operator, Functional Analysis.
The Number of Symbol Comparisons in QuickSort and QuickSelect
"... We revisit the classical QuickSort and QuickSelect algorithms, under a complexity model that fully takes into account the elementary comparisons between symbols composing the records to be processed. Our probabilistic models belong to a broad category of information sources that encompasses memory ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We revisit the classical QuickSort and QuickSelect algorithms, under a complexity model that fully takes into account the elementary comparisons between symbols composing the records to be processed. Our probabilistic models belong to a broad category of information sources that encompasses memoryless (i.e., independentsymbols) and Markov sources, as well as many unboundedcorrelation sources. We establish that, under our conditions, the averagecase complexity of QuickSort is O(n log² n) [rather than O(n log n), classically], whereas that of QuickSelect remains O(n). Explicit expressions for the implied constants are provided by our combinatorial–analytic methods.