Results 1  10
of
40
Loglog Counting of Large Cardinalities
 In ESA
, 2003
"... Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes" of auxiliary ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes" of auxiliary memory in order to estimate in a single pass the number of distinct elements (the "cardinality") in a file, and it does so with an accuracy that is of the order of 1= m. The "small bytes" to be used in order to count cardinalities till Nmax comprise about log log Nmax bits, so that cardinalities well in the range of billions can be determined using one or two kilobytes of memory only. The basic version of the LogLog algorithm is validated by a complete analysis. An optimized version, superLogLog, is also engineered and tested on reallife data. The algorithm parallelizes optimally.
Dynamical Sources in Information Theory: A General Analysis of Trie Structures
 ALGORITHMICA
, 1999
"... Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size and the sear ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markovchains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
Asymptotic laws for compositions derived from transformed subordinators
 ANN. PROBAB
, 2006
"... A random composition of n appears when the points of a random closed set ˜ R ⊂ [0, 1] are used to separate into blocks n points sampled from the uniform distribution. We study the number of parts Kn of this composition and other related functionals under the assumption that ˜ R = φ(S•) where (St, t ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
A random composition of n appears when the points of a random closed set ˜ R ⊂ [0, 1] are used to separate into blocks n points sampled from the uniform distribution. We study the number of parts Kn of this composition and other related functionals under the assumption that ˜ R = φ(S•) where (St, t ≥ 0) is a subordinator and φ: [0, ∞] → [0, 1] is a diffeomorphism. We derive the asymptotics of Kn when the Lévy measure of the subordinator is regularly varying at 0 with positive index. Specialising to the case of exponential function φ(x) = 1 −e −x we establish a connection between the asymptotics of Kn and the exponential functional of the subordinator.
Information Propagation Speed in Mobile and Delay Tolerant Networks
, 2009
"... The goal of this paper is to increase our understanding of the fundamental performance limits of mobile and Delay Tolerant Networks (DTNs), where endtoend multihop paths may not exist and communication routes may only be available through time and mobility. We use analytical tools to derive gene ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
The goal of this paper is to increase our understanding of the fundamental performance limits of mobile and Delay Tolerant Networks (DTNs), where endtoend multihop paths may not exist and communication routes may only be available through time and mobility. We use analytical tools to derive generic theoretical upper bounds for the information propagation speed in large scale mobile and intermittently connected networks. In other words, we upperbound the optimal performance, in terms of delay, that can be achieved using any routing algorithm. We then show how our analysis can be applied to specific mobility and graph models to obtain specific analytical estimates. In particular, when nodes move at speed v and their density ν is small (the network is sparse and surely disconnected), we prove that the information propagation speed is upper bounded by (1 + O(ν 2))v in the random waypoint model, while it is upper bounded by O ( √ νvv) for other mobility models (random walk, Brownian motion). We also present simulations that confirm the validity of the bounds in these scenarios.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
Local limit theorems for finite and infinite urn models
 Ann. Probab
, 2007
"... Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation. 1. Introduction. A classical theorem o ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation. 1. Introduction. A classical theorem of Rényi [29] for the number of empty boxes, denoted by μ0(n, M), in a sequence of n random allocations of indistinguishable balls into M boxes with equal probability 1/M, can be stated as follows: If the variance of μ0(n, M) tends to infinity with n, then μ0(n, M) is asymptotically normally distributed. This result, seldom stated in this form in the literature,
Hyperloglog: The analysis of a nearoptimal cardinality estimation algorithm
 IN AOFA ’07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON ANALYSIS OF ALGORITHMS
, 2007
"... This extended abstract describes and analyses a nearoptimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HYPERLOGLOG performs a single pa ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This extended abstract describes and analyses a nearoptimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04 / √ m. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10 9 with a typical accuracy of 2 % while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Gapfree compositions and gapfree samples of geometric random variables
 Discrete Math
, 2005
"... Abstract. We study the asymptotic probability that a random composition of an integer n is gapfree, that is, that the sizes of parts in the composition form an interval. We show that this problem is closely related to the study of the probability that a sample of independent, identically distribute ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Abstract. We study the asymptotic probability that a random composition of an integer n is gapfree, that is, that the sizes of parts in the composition form an interval. We show that this problem is closely related to the study of the probability that a sample of independent, identically distributed random variables with a geometric distribution is likewise gapfree. 1. introduction A composition of a natural number n is said to be gapfree if the part sizes occuring in it form an interval. In addition if the interval starts at 1, the composition is said to be complete. Example Of the 32 compositions of n = 6, there are 21 gapfree compositions arising from permuting the order of the parts of the partitions
The number of distinct values of some multiplicity in sequences of geometrically distributed . . .
"... ..."
The oscillatory distribution of distances in random tries
 ANNALS OF APPLIED PROBABILITY
, 2005
"... We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and varian ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We investigate ∆n, the distance between randomly selected pairs of nodes among n keys in a random trie, which is a kind of digital tree. Analytical techniques, such as the Mellin transform and an excursion between poissonization and depoissonization, capture small fluctuations in the mean and variance of these random distances. The mean increases logarithmically in the number of keys, but curiously enough the variance remains O(1), as n → ∞. It is demonstrated that the centered random variable ∆ ∗ n = ∆n − ⌊2log 2 n ⌋ does not have a limit distribution, but rather oscillates between two distributions.