Results 1  10
of
60
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
(Show Context)
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Laws of large numbers and tail inequalities for random tries and Patricia trees
 Journal of Computational and Applied Mathematics
, 2002
"... Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
Abstract. We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. If Hn is the height of this tree, we show that Hn/E{Hn} tends to one in probability. Additional tail inequalities are given for the height, depth, size, and profile of these trees and ordinary tries that apply without any conditions on the string distributions—they need not even be identically distributed.
A probabilistic analysis of some tree algorithms
 ANNALS OF APPLIED PROBABILITY
, 2005
"... In this paper a general class of tree algorithms is analyzed. It is shown that, by using an appropriate probabilistic representation of the quantities of interest, the asymptotic behavior of these algorithms can be obtained quite easily without resorting to the usual complex analysis techniques. Thi ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
In this paper a general class of tree algorithms is analyzed. It is shown that, by using an appropriate probabilistic representation of the quantities of interest, the asymptotic behavior of these algorithms can be obtained quite easily without resorting to the usual complex analysis techniques. This approach gives a unified probabilistic treatment of these questions. It simplifies and extends some of the results known in this domain.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
(Show Context)
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
The Complete Analysis of a Polynomial Factorization Algorithm Over Finite Fields
, 2001
"... This paper derives basic probabilistic properties of random polynomials over finite fields that are of interest in the study of polynomial factorization algorithms. We show that the main characteristics of random polynomial can be treated systematically by methods of "analytic combinatorics&quo ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
This paper derives basic probabilistic properties of random polynomials over finite fields that are of interest in the study of polynomial factorization algorithms. We show that the main characteristics of random polynomial can be treated systematically by methods of "analytic combinatorics" based on the combined use of generating functions and of singularity analysis. Our object of study is the classical factorization chain which is described in Fig. 1 and which, despite its simplicity, does not appear to have been totally analysed so far. In this paper, we provide a complete averagecase analysis.
Digital Trees and Memoryless Sources: from Arithmetics to Analysis
 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10), Discrete Math. Theor. Comput. Sci. Proc
, 2010
"... Digital trees, also known as “tries”, are fundamental to a number of algorithmic schemes, including radixbased searching and sorting, lossless text compression, dynamic hashing algorithms, communication protocols of the tree or stack type, distributed leader election, and so on. This extended abstr ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Digital trees, also known as “tries”, are fundamental to a number of algorithmic schemes, including radixbased searching and sorting, lossless text compression, dynamic hashing algorithms, communication protocols of the tree or stack type, distributed leader election, and so on. This extended abstract develops the asymptotic form of expectations of the main parameters of interest, such as tree size and path length. The analysis is conducted under the simplest of all probabilistic models; namely, the memoryless source, under which letters that data items are comprised of are drawn independently from a fixed (finite) probability distribution. The precise asymptotic structure of the parameters’ expectations is shown to depend on fine singular properties in the complex plane of a ubiquitous Dirichlet series. Consequences include the characterization of a broad range of asymptotic regimes for error terms associated with trie parameters, as well as a classification that depends on specific arithmetic properties, especially irrationality measures, of the sources under consideration.
The NDTree: A Dynamic Indexing Technique for Multidimensional Nonordered Discrete Data Spaces
 In Proc. of VLDB
, 2003
"... Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases. ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
(Show Context)
Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases.
Continued Fractions, Comparison Algorithms, and Fine Structure Constants
, 2000
"... There are known algorithms based on continued fractions for comparing fractions and for determining the sign of 2x2 determinants. The analysis of such extremely simple algorithms leads to an incursion into a surprising variety of domains. We take the reader through a light tour of dynamical systems ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
There are known algorithms based on continued fractions for comparing fractions and for determining the sign of 2x2 determinants. The analysis of such extremely simple algorithms leads to an incursion into a surprising variety of domains. We take the reader through a light tour of dynamical systems (symbolic dynamics), number theory (continued fractions), special functions (multiple zeta values), functional analysis (transfer operators), numerical analysis (series acceleration), and complex analysis (the Riemann hypothesis). These domains all eventually contribute to a detailed characterization of the complexity of comparison and sorting algorithms, either on average or in probability.
Distributional convergence for the number of symbol comparisons used by QuickSort
, 2012
"... Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic sourc ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild “tameness ” condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, i.e., whenever each key is generated as an infinite string of iid symbols. This is somewhat surprising: Even for the classical model that each key is an iid string of unbiased (“fair”) bits, the mean exhibits periodic fluctuations of order n.
Universal Asymptotics for Random Tries and Patricia Trees
 Algorithmica
, 2004
"... We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. We show that many parameters Z_n of these random structures are universally stable in the sense that Z_n/E{Z_n} tends to one probability. This occur ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
We consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. We show that many parameters Z_n of these random structures are universally stable in the sense that Z_n/E{Z_n} tends to one probability. This occurs, for example, when Z_n is the height, the size, the depth of the last node added, the number of nodes at a given depth (also called the profile), the search time for a partial match, the stack size, or the number of nodes with k children. These properties are valid without any conditions on the string distributions.