Results 1  10
of
15
An analysis of the height of tries with random weights on the edges
 Combinatorics, Probability and Computing
"... We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (195 ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (1959) and the TST of Bentley and Sedgewick (1997), where the search time for a string can be decomposed as a sum of processing times for each symbol in the string. Our weighted trie model also permits one to study maximal path imbalance. In all cases, the weighted height is shown be asymptotic to c log n in probability, where c is determined by the behavior of the core of the trie (the part where all nodes have a full set of children) and the fringe of the trie (the part of the trie where nodes have only one child and form spaghettilike trees). It can be found by maximizing a function that is related to the Cramér exponent of the distribution of the edge weights.
(Un)Expected Behavior of Digital Search Tree Profile
"... A digital search tree (DST) – one of the most fundamental data structure on words – is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular LempelZiv’78 data compression scheme to distributed hash tables. The ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A digital search tree (DST) – one of the most fundamental data structure on words – is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular LempelZiv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it is a function of the number of stored strings and the distance from the root. Most parameters of DST (e.g., height, fillup) can be expressed in terms of the profile. However, from the inception of DST, the analysis of the profile has been elusive and it has become a prominent open problem in the area of analysis of algorithms. We make here the first, but decisive step, towards solving this problem. We present a precise analysis of the average profile when stored strings are generated by a biased memoryless source. The main technical difficulty of analyzing the profile lies in solving a sophisticated recurrence equation. We present such a solution for the Poissonized version of the problem (i.e., when the number of stored strings is generated by a Poisson distribution) in the Mellin transform domain. To accomplish it, we introduce a novel functional operator that allows
On unary nodes in tries
 In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10), Discrete Math. Theor. Comput. Sci. Proc., AM. Assoc. Discrete
, 2010
"... The difference between ordinary tries and Patricia tries lies in the fact that all unary nodes are removed in the latter. Their average number is thus easily determined from earlier results on the size of tries/Patricia tries. In a wellknown contention resolution algorithm, whose probabilistic mode ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The difference between ordinary tries and Patricia tries lies in the fact that all unary nodes are removed in the latter. Their average number is thus easily determined from earlier results on the size of tries/Patricia tries. In a wellknown contention resolution algorithm, whose probabilistic model is essentially equivalent to tries, unary nodes correspond to repetitions, i.e., steps in the algorithm that do not resolve anything at all. In this paper, we take an individual’s view on such repetitions: we consider the distribution of the number of repetitions a certain contender encounters in the course of the algorithm—which is equivalent to the number of unary nodes on the path from the root to a random string in a trie. We encounter an example of a sequence of distributions that does not actually converge to a limit distribution, but rather oscillates around a (discrete) limit distribution.
On the Average Profile of Symmetric Digital Search Trees, preprint
, 2008
"... Digital Search Trees (DST) are one of the most popular data structures storing keys, usually represented by strings. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. It is a function of the number of nodes and the distance from t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Digital Search Trees (DST) are one of the most popular data structures storing keys, usually represented by strings. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. It is a function of the number of nodes and the distance from the root. Several, if not all, tree parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the profile. In this note we analyze asymptotically the average profile for a symmetric digital search trees in which keys are generated by an unbiased memoryless source. 1
On Correlation Polynomials and Subword Complexity
"... We consider words with letters from a qary alphabet A. The kth subword complexity of a word w ∈ A ∗ is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider words with letters from a qary alphabet A. The kth subword complexity of a word w ∈ A ∗ is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected kth subword complexity of a randomlychosen word w ∈ A n. Our other main result describes, for w ∈ A ∗ , the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. Our methods rely upon a precise characterization of overlaps between words of length k. We use three kinds of correlation polynomials of words of length k: unweighted correlation polynomials; correlation polynomials associated to a Bernoulli source; and generalized multivariate correlation polynomials. We survey previouslyknown results about such polynomials, and we also present some new results concerning correlation polynomials.
THE DEGREE PROFILE OF PÓLYA TREES
"... Abstract. We investigate the profile of random Pólya trees of size n when only nodes of degree d are counted in each level. It is shown that, as in the case where all nodes contribute to the profile, the suitably normalized profile process converges weakly to a Brownian excursion local time. Moreove ..."
Abstract
 Add to MetaCart
Abstract. We investigate the profile of random Pólya trees of size n when only nodes of degree d are counted in each level. It is shown that, as in the case where all nodes contribute to the profile, the suitably normalized profile process converges weakly to a Brownian excursion local time. Moreover, we investigate the joint distribution of the number of nodes of degree d1 and d2 in the levels of the tree. 1.
On a Recurrence Arising in Graph Compression
, 2011
"... In a recently proposed graphical compression algorithm [1], the following tree arose in the course of the analysis. The root contains n balls that are consequently distributed betweentwosubtreesaccordingtoasimplerule: Ineachstep, allballsindependentlymove down to the left subtree (say with probabili ..."
Abstract
 Add to MetaCart
In a recently proposed graphical compression algorithm [1], the following tree arose in the course of the analysis. The root contains n balls that are consequently distributed betweentwosubtreesaccordingtoasimplerule: Ineachstep, allballsindependentlymove down to the left subtree (say with probability p) or the right subtree (with probability 1−p). Anewnodeiscreatedaslongasthereisatleastoneballinthatnode. Furthermore, a nonnegative integer d is given, and at level d or greater one ball is removed from the leftmost node before the balls move down to the next level. These steps are repeated until all balls are removed (i.e., after n + d steps). Observe that when d = ∞ the above tree can be modeled as a trie that stores n independent sequences generated by a memoryless source with parameter p. Therefore, we coin the name (n,d)tries for the tree just described, and to which we often refer simply as dtries. Parameters of such a tree (e.g., path length, depth, size) are determined by an interesting twodimensional recurrence (in terms of n and d) that – to the best of our knowledge – was not analyzed before. We study it, and show how much parameters of such a dtrie differ from the corresponding parameters of regular tries. We use methods of analytic algorithmics, from Mellin transforms to analytic poissonization. 1
Classification of Markov Sources Through Joint String Complexity: Theory and Experiments
"... Abstract—We propose a classification test to discriminate Markov sources [19] based on the joint string complexity. String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that ..."
Abstract
 Add to MetaCart
Abstract—We propose a classification test to discriminate Markov sources [19] based on the joint string complexity. String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that are common to both strings. In this paper we analyze the average joint complexity when both strings are generated by a Markov source and provide fast converging asymptotic expansions. We also present some experimental results showing its usefulness to texts discrimination. I.