Results 1  10
of
141
Balanced Allocations
 SIAM Journal on Computing
, 1994
"... Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place ..."
Abstract

Cited by 332 (8 self)
 Add to MetaCart
(Show Context)
Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. We show that with high probability, the fullest box contains only ln ln n/ ln 2 +O(1) balls  exponentially less than before. Furthermore, we show that a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above. We discuss consequences of this and related theorems for dynamic resource allocation, hashing, and online load balancing.
Minwise Independent Permutations
 Journal of Computer and System Sciences
, 1998
"... We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set ..."
Abstract

Cited by 269 (13 self)
 Add to MetaCart
(Show Context)
We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter nearduplicate documents. However, in the course of our investigation we have discovered interesting and challenging theoretical questions related to this concept – we present the solutions to some of them and we list the rest as open problems.
Faster algorithms for the shortest path problem
, 1990
"... Efficient implementations of Dijkstra's shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, mn edges, and nonnegative integer arc costs bounded by C, a onelevel form of radix heap gives a t ..."
Abstract

Cited by 127 (10 self)
 Add to MetaCart
(Show Context)
Efficient implementations of Dijkstra's shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, mn edges, and nonnegative integer arc costs bounded by C, a onelevel form of radix heap gives a time bound for Dijkstra's algorithm of O(m + n log C). A twolevel form of radix heap gives a bound of O(m + n log C/log log C). A combination of a radix heap and a previously known data structure called a Fibonacci heap gives a bound of O(m + n /log C). The best previously known bounds are O(m + n log n) using Fibonacci heaps alone and O(m log log C) using the priority queue structure of Van Emde Boas et al. [17].
Let Sleeping Files Lie: Pattern Matching in Zcompressed Files
, 1994
"... The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most ef ..."
Abstract

Cited by 112 (2 self)
 Add to MetaCart
The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a substring depends on its location. Thus the same substring may "look different" every time it appears in the compressed text. In this paper we consider pattern matching without decompression in the UNIX Zcompression. This is a variant of the LempelZiv adaptive compression scheme. If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n+m 2 ) or O(n log m+m). We also introduce a new criterion to measure compressed matching ...
ClosestPoint Problems in Computational Geometry
, 1997
"... This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, th ..."
Abstract

Cited by 74 (14 self)
 Add to MetaCart
This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, the exact and approximate postoffice problem, and the problem of constructing spanners are discussed in detail. Contents 1 Introduction 1 2 The static closest pair problem 4 2.1 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Algorithms that are optimal in the algebraic computation tree model . 5 2.2.1 An algorithm based on the Voronoi diagram . . . . . . . . . . . 5 2.2.2 A divideandconquer algorithm . . . . . . . . . . . . . . . . . . 5 2.2.3 A plane sweep algorithm . . . . . . . . . . . . . . . . . . . . . . 6 2.3 A deterministic algorithm that uses indirect addressing . . . . . . . . . 7 2.3.1 The degraded grid . . . . . . . . . . . . . . . . . . ...
Membership in constant time and almostminimum space
 SIAM Journal on Computing
, 1999
"... Abstract. This paper deals with the problem of storing a subset of elements from the bounded universeM = {0,...,M−1} so that membership queries can be performed efficiently. In particular, we introduce a data structure to represent a subset of N elements of M in a number of bits close to the informa ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
(Show Context)
Abstract. This paper deals with the problem of storing a subset of elements from the bounded universeM = {0,...,M−1} so that membership queries can be performed efficiently. In particular, we introduce a data structure to represent a subset of N elements of M in a number of bits close to the informationtheoretic minimum, B = lg
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
(Show Context)
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
Efficient, Transparent and Comprehensive Runtime Code Manipulation
, 2004
"... This thesis addresses the challenges of building a software system for generalpurpose runtime code manipulation. Modern applications, with dynamicallyloaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every i ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
This thesis addresses the challenges of building a software system for generalpurpose runtime code manipulation. Modern applications, with dynamicallyloaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for finegrained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fullyimplemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamicallyloaded, generated, or even modified code. Despite the
Minwise independent permutations (extended abstract
 In STOC ’98: Proceedings of the thirtieth annual ACM symposium on Theory of computing
, 1998
"... We define and study the notion of minwise independent families of permutations. We say that F⊆Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr ( min{π(X)} = π(x) ) = 1 X . In other words we require that all the elements of any fixed set ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
(Show Context)
We define and study the notion of minwise independent families of permutations. We say that F⊆Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr ( min{π(X)} = π(x) ) = 1 X . In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter nearduplicate documents. However, in the course of
A Perfect Hash Function Generator
"... gperf is a "softwaretool generatingtool" designed to automate the generation of perfect hash functions. This paper describes the features, algorithms, and objectoriented design and implementation strategies incorporated in gperf.Italso presents the results from an empirical comparison ..."
Abstract

Cited by 55 (36 self)
 Add to MetaCart
gperf is a "softwaretool generatingtool" designed to automate the generation of perfect hash functions. This paper describes the features, algorithms, and objectoriented design and implementation strategies incorporated in gperf.Italso presents the results from an empirical comparison between gperfgenerated recognizers and other popular techniques for reserved word lookup. gperf is distributed with the GNU libg++ library and is used to generate the keyword recognizers for the GNU C and GNU C++ compilers. 1 Introduction Perfect hash functions are a time and space efficient implementation of static search sets, which are ADTs with operations like initialize, insert,andretrieve. Static search sets are common in system software applications. Typical static search sets include compiler and interpreter reserved words, assembler instruction mnemonics, and shell interpreter builtin commands. Search set elements are called keywords.Key words are inserted into the set once, usually at c...