Results 1  10
of
13
The oscillatory distribution of distances in random tries
 ANNALS OF APPLIED PROBABILITY
, 2005
"... We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and varian ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and variance of these random distances. The mean increases logarithmically in the number of keys, but curiously enough the variance remains O(1), as n → ∞. It is demonstrated that the centered random variable ∆ ∗ n = ∆n − ⌊2log 2 n ⌋ does not have a limit distribution, but rather oscillates between two distributions.
RedBlack Trie Hashing
, 1995
"... Trie hashing is a scheme, proposed by Litwin, for indexing records with very long alphanumeric keys. The records are grouped into buckets of capacity b and maintained on secondary storage. To retrieve a record, the memory resident trie is traversed from the root to a leaf node where the address of t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Trie hashing is a scheme, proposed by Litwin, for indexing records with very long alphanumeric keys. The records are grouped into buckets of capacity b and maintained on secondary storage. To retrieve a record, the memory resident trie is traversed from the root to a leaf node where the address of the target bucket is found. Using the address found, the data bucket is read into memory and searched to determine the presence or absence of the record. The scheme, for all practical purposes, locates a record in one or two disk accesses. Unlike a trie, the scheme proposed suffers from potential degeneracy when the keys inserted are ordered and has an expensive reconstruction cost if a system failure occurs during a session. We present a new approach to implementing Trie Hashing that resolves the degeneracy problem. Our approach combines the basic trie hashing algorithm with the balancing techniques of the RedBlack Binary Search Tree, to produce a relatively balanced trie hashing scheme. As...
An analysis of the height of tries with random weights on the edges
 Combinatorics, Probability and Computing
"... We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (195 ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (1959) and the TST of Bentley and Sedgewick (1997), where the search time for a string can be decomposed as a sum of processing times for each symbol in the string. Our weighted trie model also permits one to study maximal path imbalance. In all cases, the weighted height is shown be asymptotic to c log n in probability, where c is determined by the behavior of the core of the trie (the part where all nodes have a full set of children) and the fringe of the trie (the part of the trie where nodes have only one child and form spaghettilike trees). It can be found by maximizing a function that is related to the Cramér exponent of the distribution of the edge weights.
Distances in random digital search trees
, 2006
"... Distances between nodes in random trees is a popular topic, and several classes of trees have recently been investigated. We look into this matter in digital search trees. By analytic techniques, such as the Mellin Transform and poissonization, we describe a program to determine the moments of the ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Distances between nodes in random trees is a popular topic, and several classes of trees have recently been investigated. We look into this matter in digital search trees. By analytic techniques, such as the Mellin Transform and poissonization, we describe a program to determine the moments of these distances. The program is illustrated on the mean and variance. One encounters delayed Mellin transform equations, which we solve by inspection. In addition to various asymptotics, we give an exact expression for the mean and for the variance in the unbiased case. Interestingly, the unbiased case gives a bounded variance, whereas the biased case gives a variance growing with the number of keys. It is therefore possible in the biased case to show that an appropriately normalized version of the distance converges to a limit. The complexity of moment calculation increases substantially with each higher moment; it is prudent to seek a shortcut to the limit via a method that avoids the computation of all moments. Toward this end, we utilize the contraction method to show that in biased digital search trees the distribution of a suitably normalized version of the distances approaches a limit that is the fixedpoint solution of a distributional equation (distances being measured in the Wasserstein metric space). An explicit solution to the fixedpoint equation is readily demonstrated to be Gaussian.
Distribution of internode distances in digital trees
 in 2005 International Conference on Analysis of Algorithms, C. Martínez (ed.), Discrete Mathematics and Theoretical Computer Science, Proceedings AD
, 2005
"... We investigate distances between pairs of nodes in digital trees (digital search trees (DST), and tries). By analytic techniques, such as the Mellin Transform and poissonization, we describe a program to determine the moments of these distances. The program is illustrated on the mean and variance. O ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We investigate distances between pairs of nodes in digital trees (digital search trees (DST), and tries). By analytic techniques, such as the Mellin Transform and poissonization, we describe a program to determine the moments of these distances. The program is illustrated on the mean and variance. One encounters delayed Mellin transform equations, which we solve by inspection. Interestingly, the unbiased case gives a bounded variance, whereas the biased case gives a variance growing with the number of keys. It is therefore possible in the biased case to show that an appropriately normalized version of the distance converges to a limit. The complexity of moment calculation increases substantially with each higher moment; A shortcut to the limit is needed via a method that avoids the computation of all moments. Toward this end, we utilize the contraction method to show that in biased digital search trees the distribution of a suitably normalized version of the distances approaches a limit that is the fixedpoint solution (in the Wasserstein space) of a distributional equation. An explicit solution to the fixedpoint equation is readily demonstrated to be Gaussian.
The total path length of split trees
, 2011
"... We consider the model of random trees introduced by Devroye [SIAM J Comput 28, 409– 432, 1998]. The model encompasses many important randomized algorithms and data structures. The pieces of data (items) are stored in a randomized fashion in the nodes of a tree. The total path length (sum of depths o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the model of random trees introduced by Devroye [SIAM J Comput 28, 409– 432, 1998]. The model encompasses many important randomized algorithms and data structures. The pieces of data (items) are stored in a randomized fashion in the nodes of a tree. The total path length (sum of depths of the items) is a natural measure of the efficiency of the algorithm/data structure. Using renewal theory, we prove convergence in distribution of the total path length towards a distribution characterized uniquely by a fixed point equation. Our result covers, using a unified approach, many data structures such as binary search trees, mary search trees, quad trees, medianof(2k + 1) trees, and simplex trees. 1
RedBlack Balanced Trie Hashing
, 1995
"... Trie hashing is a scheme, proposed by Litwin, for indexing records with very long alphanumeric keys. The records are grouped into buckets of capacity b records per bucket and maintained on secondary storage. To retrieve a record, the memory resident trie is traversed from the root to a leaf node whe ..."
Abstract
 Add to MetaCart
Trie hashing is a scheme, proposed by Litwin, for indexing records with very long alphanumeric keys. The records are grouped into buckets of capacity b records per bucket and maintained on secondary storage. To retrieve a record, the memory resident trie is traversed from the root to a leaf node where the address of the target bucket is found. Using the address found, the data bucket is read into memory and searched to determine the presence or absence of the record. The scheme, for all practical purposes, locates a record in one or two disk accesses. Unlike a trie, the scheme suffers from: i) potential degeneracy when the keys inserted are ordered, ii) expensive reconstruction cost if a system failure occurs during a session. We present a new approach to implementing Trie Hashing that resolves the problem of potential degeneracy. Our approach combines the basic trie hashing algorithm with the balancing techniques of the RedBlack Binary Search Tree, to produce a relatively balanced tr...
Efficient Discovery of Common Substructures in Macromolecules
 In IEEE Intl. Conference on Data Mining ’02
, 2002
"... Biological macromolecules play a fundamental role in disease; therefore, they are of great interest to fields such as pharmacology and chemical genomics. Yet due to macromolecules ' complexity, development of effective techniques for elucidating structurefunction macromolecular relationships has be ..."
Abstract
 Add to MetaCart
Biological macromolecules play a fundamental role in disease; therefore, they are of great interest to fields such as pharmacology and chemical genomics. Yet due to macromolecules ' complexity, development of effective techniques for elucidating structurefunction macromolecular relationships has been ill explored. Previous techniques have either focused on sequence analysis, which only approximates structurefunction relationships, or on small coordinate datasets, which does not scale to large datasets or handle noise. We present a novel scalable approach to efficiently discover macromolecule substructures based on threedimensional coordinate data, without domainspecific knowledge. The approach combines structurebased frequent pattern discovery with search space reduction and coordinate noise handling. We analyze computational performance compared to traditional approaches, validate that our approach can discover meaningful substructures in noisy macromolecule data by automated discovery of primary and secondary protein structures, and show that our technique is superior to sequencebased approaches at determining structural, and thus functional, similarity between proteins.
The Height and Size of Random Hash Trees and Random Pebbled Hash Trees
, 1999
"... The random hash tree and the Ntree were introduced by Ehrlich in 1981. In the random hash tree, n data points are hashed to values X 1 ,...,X n , independently and identically distributed random variables taking values that are uniformly distributed on [0, 1]. Place the X i 's in n equalsized buck ..."
Abstract
 Add to MetaCart
The random hash tree and the Ntree were introduced by Ehrlich in 1981. In the random hash tree, n data points are hashed to values X 1 ,...,X n , independently and identically distributed random variables taking values that are uniformly distributed on [0, 1]. Place the X i 's in n equalsized buckets as in hashing with chaining. For each bucket with at least two points, repeat the same process, keeping the branch factor always equal to the number of bucketed points. If Hn is the height of tree obtained in this manner, we show that Hn/ log 2 n 1 in probability. In the random pebbled hash tree, we remove one point randomly and place it in the present node (as with the digital search tree modification of a trie) and perform the bucketing step as above on the remaining points (if any). With this simple modification, Hn in probability. We also show that the expected number of nodes in the random hash tree and random pebbled hash tree is asymptotic to 2.3020238 ...n and 1.4183342 ...n, respectively.