Results 1  10
of
55
Loglog Counting of Large Cardinalities
 In ESA
, 2003
"... Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes&quo ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
(Show Context)
Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes" of auxiliary memory in order to estimate in a single pass the number of distinct elements (the "cardinality") in a file, and it does so with an accuracy that is of the order of 1= m. The "small bytes" to be used in order to count cardinalities till Nmax comprise about log log Nmax bits, so that cardinalities well in the range of billions can be determined using one or two kilobytes of memory only. The basic version of the LogLog algorithm is validated by a complete analysis. An optimized version, superLogLog, is also engineered and tested on reallife data. The algorithm parallelizes optimally.
Dynamical Sources in Information Theory: A General Analysis of Trie Structures
 ALGORITHMICA
, 1999
"... Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size an ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markovchains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
Information Propagation Speed in Mobile and Delay Tolerant Networks
, 2009
"... The goal of this paper is to increase our understanding of the fundamental performance limits of mobile and Delay Tolerant Networks (DTNs), where endtoend multihop paths may not exist and communication routes may only be available through time and mobility. We use analytical tools to derive gene ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
The goal of this paper is to increase our understanding of the fundamental performance limits of mobile and Delay Tolerant Networks (DTNs), where endtoend multihop paths may not exist and communication routes may only be available through time and mobility. We use analytical tools to derive generic theoretical upper bounds for the information propagation speed in large scale mobile and intermittently connected networks. In other words, we upperbound the optimal performance, in terms of delay, that can be achieved using any routing algorithm. We then show how our analysis can be applied to specific mobility and graph models to obtain specific analytical estimates. In particular, when nodes move at speed v and their density ν is small (the network is sparse and surely disconnected), we prove that the information propagation speed is upper bounded by (1 + O(ν 2))v in the random waypoint model, while it is upper bounded by O ( √ νvv) for other mobility models (random walk, Brownian motion). We also present simulations that confirm the validity of the bounds in these scenarios.
Hyperloglog: The analysis of a nearoptimal cardinality estimation algorithm
 IN AOFA ’07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON ANALYSIS OF ALGORITHMS
, 2007
"... This extended abstract describes and analyses a nearoptimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HYPERLOGLOG performs a single pa ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
(Show Context)
This extended abstract describes and analyses a nearoptimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04 / √ m. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10 9 with a typical accuracy of 2 % while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Asymptotic laws for compositions derived from transformed subordinators
 ANN. PROBAB
, 2006
"... A random composition of n appears when the points of a random closed set ˜ R ⊂ [0, 1] are used to separate into blocks n points sampled from the uniform distribution. We study the number of parts Kn of this composition and other related functionals under the assumption that ˜ R = φ(S•) where (St, t ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
(Show Context)
A random composition of n appears when the points of a random closed set ˜ R ⊂ [0, 1] are used to separate into blocks n points sampled from the uniform distribution. We study the number of parts Kn of this composition and other related functionals under the assumption that ˜ R = φ(S•) where (St, t ≥ 0) is a subordinator and φ: [0, ∞] → [0, 1] is a diffeomorphism. We derive the asymptotics of Kn when the Lévy measure of the subordinator is regularly varying at 0 with positive index. Specialising to the case of exponential function φ(x) = 1 −e −x we establish a connection between the asymptotics of Kn and the exponential functional of the subordinator.
Local limit theorems for finite and infinite urn models
 ANN. PROBAB
, 2008
"... Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation. ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
(Show Context)
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
Pólya urn models and connections to random trees: A review
 Journal of the Iranian Statistical Society
"... Abstract. This paper reviews Pólya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • PólyaEggenberger’s ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper reviews Pólya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • PólyaEggenberger’s urn • Bernard Friedman’s urn • Generalized Pólya urns • Extended urn schemes • Invertible urn schemes Connections to random trees are surveyed. Numerous applications to trees common in computer science are discussed, including:
The number of distinct values of some multiplicity in sequences of geometrically distributed . . .
"... ..."
Gapfree compositions and gapfree samples of geometric random variables
 Discrete Math
, 2005
"... Abstract. We study the asymptotic probability that a random composition of an integer n is gapfree, that is, that the sizes of parts in the composition form an interval. We show that this problem is closely related to the study of the probability that a sample of independent, identically distribute ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We study the asymptotic probability that a random composition of an integer n is gapfree, that is, that the sizes of parts in the composition form an interval. We show that this problem is closely related to the study of the probability that a sample of independent, identically distributed random variables with a geometric distribution is likewise gapfree. 1. introduction A composition of a natural number n is said to be gapfree if the part sizes occuring in it form an interval. In addition if the interval starts at 1, the composition is said to be complete. Example Of the 32 compositions of n = 6, there are 21 gapfree compositions arising from permuting the order of the parts of the partitions