Results 1  10
of
23
Efficient tree layout in a multilevel memory hierarchy, arXiv:cs.DS/0211010
, 2003
"... We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a roottoleaf path, subject to a given probability distribution on the leaves. This problem was previ ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a roottoleaf path, subject to a given probability distribution on the leaves. This problem was previously considered by Gil and Itai, who developed optimal but slow algorithms when the blocktransfer size B is known. We present faster but approximate algorithms for the same problem; the fastest such algorithm runs in linear time and produces a solution that is within an additive constant of optimal. In addition, we show how to extend any approximately optimal algorithm to the cacheoblivious setting in which the blocktransfer size is unknown to the algorithm. The query performance of the cacheoblivious layout is within a constant factor of the query performance of the optimal knownblocksize layout. Computing the cacheoblivious layout requires only logarithmically many calls to the layout algorithm for known block size; in particular, the cacheoblivious layout can be computed in O(N lg N) time, where N is the number of nodes. Finally, we analyze two greedy strategies, and show that they have a performance ratio between Ω(lg B / lg lg B) and O(lg B) when compared to the optimal layout.
Asymptotic approximation of the movetofront search cost distribution and leastrecentlyused caching fault probabilities
, 1999
"... Consider a finite list of items n = 1 � 2 � � � � � N, that are requested according to an i.i.d. process. Each time an item is requested it is moved to the front of the list. The associated search cost C N for accessing an item is equal to its position before being moved. If the request distribu ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
Consider a finite list of items n = 1 � 2 � � � � � N, that are requested according to an i.i.d. process. Each time an item is requested it is moved to the front of the list. The associated search cost C N for accessing an item is equal to its position before being moved. If the request distribution converges to a proper distribution as N → ∞, then the stationary search cost C N converges in distribution to a limiting search cost C. We show that, when the (limiting) request distribution has a heavy tail (e.g., generalized Zipf’s law), P�R = n � ∼ c/n α as n → ∞, α> 1, then the limiting stationary search cost distribution P�C> n�, or, equivalently, the leastrecently used (LRU) caching fault probability, satisfies P�C> n� lim n→ ∞ P�R> n � =
Perfect hashing for strings: Formalization and Algorithms
 IN PROC 7TH CPM
, 1996
"... Numbers and strings are two objects manipulated by most programs. Hashing has been wellstudied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings th ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Numbers and strings are two objects manipulated by most programs. Hashing has been wellstudied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly efficient sequential/parallel (CRCW PRAM) Las Vegas type algorithms that determine a perfect hash function for substring hashing. For example, given a binary string of length n, one of our algorithms finds a perfect hash function in O(log n) time, O(n) work, and O(n) space; the hash value for any substring can then be computed in O(log log n) time using a single processor. Our approach relies on a novel use of the suffix tree of a string. In implementing our approach, we design optimal parallel algorithms for the problem of determining weighted ancestors on a edgeweighted tree that may be of independent interest.
The oscillatory distribution of distances in random tries
 ANNALS OF APPLIED PROBABILITY
, 2005
"... We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and varian ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and variance of these random distances. The mean increases logarithmically in the number of keys, but curiously enough the variance remains O(1), as n → ∞. It is demonstrated that the centered random variable ∆ ∗ n = ∆n − ⌊2log 2 n ⌋ does not have a limit distribution, but rather oscillates between two distributions.
The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance
 The Annals of Applied Probability
, 2006
"... For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another wellknown ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another wellknown index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.
A "Linear Logic" Quicksort
, 1994
"... INTRODUCTION Linear logic [Girard87] [Lafont88] [Abramsky93] has been proposed as the basis for a "linear" computer language which preserves the cleanliness of functional programming, yet allows efficient "updateinplace" array operations, no tracing garbage collection and no synchronization. Howe ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
INTRODUCTION Linear logic [Girard87] [Lafont88] [Abramsky93] has been proposed as the basis for a "linear" computer language which preserves the cleanliness of functional programming, yet allows efficient "updateinplace" array operations, no tracing garbage collection and no synchronization. However, some early measurements on linear languages have been disappointing. Wakeling [Wakeling91] complains about the inefficiencies of his version of "linear" ML, especially on list and array variants of Quicksort sorting algorithm, as well as about the stilted linear programming style. We have programmed several versions of Quicksort in a linear fragment of Common Lisp, andcontrary to the conclusions of [Wakeling91]find that Linear Lisp produces a very fast Quicksort routine. In fact, a linear Quicksort routine for lists (which doesn't require garbage collection) is considerably faster than a nonlinear Quicksort routine for lists (which does require garbage collection)
Parallel comparison algorithms for approximation problems
 Proc 29th IEEE FOCS, Yorktown Heights, NY 1988, IEEE
"... Suppose we have n elements from a totally ordered domain, and we are allowed to perform p parallel comparisons in each time unit ( round). In this paper we determine, up to a constant factor, the time complexity of several approximation problems in the common parallel comparison tree model of Vali ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Suppose we have n elements from a totally ordered domain, and we are allowed to perform p parallel comparisons in each time unit ( round). In this paper we determine, up to a constant factor, the time complexity of several approximation problems in the common parallel comparison tree model of Valiant, for all admissible values of n, p and e, where e is an accuracy parameter determining the quality of the required approximation. The problems considered include the approximate maximum problem, approximate sorting and approximate merging. Our results imply as special cases, all the known results about the time complexity for parallel sorting, parallel merging and parallel selection of the maximum (in the comparison model), up to a constant factor. We mention one very special but representative result concerning the approximate maximum problem; suppose we wish to find, among the given n elements, one which belongs to the biggest n/2, where in each round we are allowed to ask n binary comparisons. We show that log * n + O(1) rounds are both necessary and sufficient in the best algorithm for this problem. 1.
Analytic Variations on Bucket Selection and Sorting
, 1998
"... : We provide complete averagecase as well as probabilistic analysis of the cost of bucket selection and sorting algorithms. Two variations of bucketing (and flavors therein) are considered: distributive bucketing (large number of buckets) and radix bucketing (recursive with a small number of bucket ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
: We provide complete averagecase as well as probabilistic analysis of the cost of bucket selection and sorting algorithms. Two variations of bucketing (and flavors therein) are considered: distributive bucketing (large number of buckets) and radix bucketing (recursive with a small number of buckets, suitable for digital computation). For Distributive Selection a compound Poisson limit is established. For all other flavors of bucket selection and sorting, central limit theorems underlying the cost are derived by asymptotic techniques involving perturbation of Rice's integral and contour integration (saddle point methods). In the case of radix bucketing, periodic fluctuations appear in the moments of both the selection and sorting algorithms. (R'esum'e : tsvp) Unit'e de recherche INRIA Rocquencourt Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) T'el'ephone : (33) 01 39 63 55 11  T'el'ecopie : (33) 01 39 63 53 Variations analytiques sur la s'election et l...
Renewals for exponentially increasing lifetimes, with an application to digital search
, 2007
"... We show that the number of renewals up to time t exhibits distributional fluctuations as t → ∞ if the underlying lifetimes increase at an exponential rate in a distributional sense. This provides a probabilistic explanation for the asymptotics of insertion depth in random trees generated by a bitc ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We show that the number of renewals up to time t exhibits distributional fluctuations as t → ∞ if the underlying lifetimes increase at an exponential rate in a distributional sense. This provides a probabilistic explanation for the asymptotics of insertion depth in random trees generated by a bitcomparison strategy from uniform input; we also obtain a representation for the resulting family of limit laws along subsequences. Our approach can also be used to obtain rates of convergence.