Results 1  10
of
12
Fast Crash Recovery in RAMCloud
 In Proc. of SOSP’11
, 2011
"... RAMCloud is a DRAMbased storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruc ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
RAMCloud is a DRAMbased storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruct lost data. The system uses a logstructured approach for all its data, in DRAM as well as on disk; this provides high performance both during normal operation and during recovery. RAMCloud employs randomized techniques to manage the system in a scalable and decentralized fashion. In a 60node cluster, RAMCloud recovers 35 GB of data from a failed server in 1.6 seconds. Our measurements suggest that the approach will scale to recover larger memory sizes (64 GB or more) in less time with larger clusters.
The natural workstealing algorithm is stable
 In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatoralloca ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatorallocation functions. During each timestep of our process, a generatorallocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unittime task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the workgeneration model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural workstealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Cuckoo hashing: Further analysis
, 2003
"... We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for various probabilities and constraints. The analysis rests on simple properties of branching processes.
Allocating Weighted Jobs in Parallel
, 1997
"... It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n if a ball is allocated in one (suitably chosen) of two (i.u.r.) bins. Stemann ([Ste96]) shows that r communication rounds suffice to guarantee a maximum load of maxf r p log n; O \Gamma m n \Delta g, with high probability. Adler et al. have shown in [ACMR95] that Stemanns protocol is optimal for constant r. In this paper we extend the above results in two directions: We generalize the lower bound to arbitrary r log log n. This implies that the result of Stemanns protocol is optimal for all r. Our main result is a generalization of Stemanns upper bound to weighted jobs: Let W A (W M ) denote the average (maximum) weight of the balls. Further let \Delta = W A =W M . Note that...
Parallel Continuous Randomized Load Balancing (Extended Abstract)
 In Proceedings of the Tenth ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... ) Petra Berenbrink Department of Mathematics and Computer Science Paderborn University, Germany Email: pebe@unipaderborn.de Tom Friedetzky and Ernst W. Mayr y Institut fur Informatik Technische Universitat Munchen, Germany Email: (friedetzmayr)@informatik.tumuenchen.de Abstract Recently, ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
) Petra Berenbrink Department of Mathematics and Computer Science Paderborn University, Germany Email: pebe@unipaderborn.de Tom Friedetzky and Ernst W. Mayr y Institut fur Informatik Technische Universitat Munchen, Germany Email: (friedetzmayr)@informatik.tumuenchen.de Abstract Recently, the subject of allocating tasks to servers has attracted much attention. There are several ways of distinguishing load balancing problems. There are sequential and parallel strategies, that is, placing the tasks one after the other or all of them in parallel. Another approach divides load balancing problems into continuous and static ones. In the continuous case new tasks are generated and consumed as time proceeds, in the second case the number of tasks is fixed. We present and analyze a parallel randomized continuous load balancing algorithm in a scenario where n processors continuously generate and consume tasks according to some given probability distribution. Each processor initiates l...
Performance, scalability, and semantics of concurrent FIFO queues
, 2011
"... We introduce the notion of a kFIFO queue which may dequeue elements out of FIFO order up to a constant k ≥ 0. Retrieving the oldest element from the queue may require up to k + 1 dequeue operations (bounded fairness), which may return elements not younger than the k + 1 oldest elements in the queu ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We introduce the notion of a kFIFO queue which may dequeue elements out of FIFO order up to a constant k ≥ 0. Retrieving the oldest element from the queue may require up to k + 1 dequeue operations (bounded fairness), which may return elements not younger than the k + 1 oldest elements in the queue (bounded age) or nothing even if there are elements in the queue. A kFIFO queue is starvationfree for finite k where k + 1 is what we call the worstcase semantical deviation (WCSD) of the queue from a regular FIFO queue. The WCSD bounds the actual semantical deviation (ASD) of a kFIFO queue from a regular FIFO queue when applied to a given workload. Intuitively, the ASD keeps track of the number of dequeue operations necessary to return oldest elements and the age of dequeued elements. We show that a number of existing concurrent algorithms implement kFIFO queues whose WCSD are determined by configurable constants independent from any workload. We then introduce socalled Scal queues, which implement kFIFO queues with generally larger, workloaddependent as well as unbounded WCSD. Since ASD cannot be obtained without prohibitive overhead we have developed a tool that computes lower bounds on ASD from timestamped runs. Our micro and macrobenchmarks on a stateoftheart 40core multiprocessor machine show that Scal queues, as an immediate consequence of their weaker WCSD, outperform and outscale existing implementations at the expense of moderately increased lower bounds on ASD.
On powerofchoice in downlink transmission scheduling
 Inform. Theory and Applicat. Workshop
, 2008
"... Abstract — A lowcomplexity guiding principle is considered for transmission scheduling from n homogeneous queues whose channel states fluctuate independently. The scheduler transmits from a longest queue within d randomly chosen queues with eligible channel states. A Markovian model is studied wher ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract — A lowcomplexity guiding principle is considered for transmission scheduling from n homogeneous queues whose channel states fluctuate independently. The scheduler transmits from a longest queue within d randomly chosen queues with eligible channel states. A Markovian model is studied where mean packet transmission time is n −1 and packet arrival rate is λ < 1 per queue. Equilibrium distribution of queue occupancy is obtained in the limit as n → ∞ and it is shown to have tails that decay as Θ((λ/d) k). If transmissions are scheduled from a longest eligible queue in the entire system then almost all queues are empty in equilibrium; the number of queues with one packet is Θ(1) and the number of queues with more than one packet is o(1) as n → ∞. Equilibrium distribution of the total number of packets in the system is also characterized in this latter case. I.
Simple Competitive Request Scheduling Strategies
 in 11th ACM Symposium on Parallel Architectures and Algorithms
, 1999
"... In this paper we study the problem of scheduling realtime requests in distributed data servers. We assume the time to be divided into time steps of equal length called rounds. During every round a set of requests arrives at the system, and every resource is able to fulfill one request per round. Ev ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we study the problem of scheduling realtime requests in distributed data servers. We assume the time to be divided into time steps of equal length called rounds. During every round a set of requests arrives at the system, and every resource is able to fulfill one request per round. Every request specifies two (distinct) resources and requires to get access to one of them. Furthermore, every request has a deadline of d, i.e. a request that arrives in round t has to be fulfilled during round t +d 1 at the latest. The number of requests which arrive during some round and the two alternative resources of every request are selected by an adversary. The goal is to maximize the number of requests that are fulfilled before their deadlines expire. We examine the scheduling problem in an online setting, i.e. new requests continuously arrive at the system, and we have to determine online an assignment of the requests to the resources in such a way that every resource has to fulfil...
On Worst Case RobinHood Hashing
 SIAM J. Computing
, 2004
"... We consider open addressing hashing and implement it by using the Robin Hood strategy; that is, in case of collision, the element that has traveled the farthest can stay in the slot. We hash ∼ αn elements into a table of size n where each probe is independent and uniformly distributed over the tab ..."
Abstract
 Add to MetaCart
We consider open addressing hashing and implement it by using the Robin Hood strategy; that is, in case of collision, the element that has traveled the farthest can stay in the slot. We hash ∼ αn elements into a table of size n where each probe is independent and uniformly distributed over the table, and α<1 is a constant. Let Mn be the maximum search time for any of the elements in the table. We show that with probability tending to one, Mn ∈ [log 2 log n + σ, log 2 log n + τ] for some constants σ, τ depending upon α only. This is an exponential improvement over the maximum search time in case of the standard FCFS (first come first served) collision strategy and virtually matches the performance of multiplechoice hash methods.
The Height and Size of Random Hash Trees and Random Pebbled Hash Trees
, 1999
"... The random hash tree and the Ntree were introduced by Ehrlich in 1981. In the random hash tree, n data points are hashed to values X 1 ,...,X n , independently and identically distributed random variables taking values that are uniformly distributed on [0, 1]. Place the X i 's in n equalsized buck ..."
Abstract
 Add to MetaCart
The random hash tree and the Ntree were introduced by Ehrlich in 1981. In the random hash tree, n data points are hashed to values X 1 ,...,X n , independently and identically distributed random variables taking values that are uniformly distributed on [0, 1]. Place the X i 's in n equalsized buckets as in hashing with chaining. For each bucket with at least two points, repeat the same process, keeping the branch factor always equal to the number of bucketed points. If Hn is the height of tree obtained in this manner, we show that Hn/ log 2 n 1 in probability. In the random pebbled hash tree, we remove one point randomly and place it in the present node (as with the digital search tree modification of a trie) and perform the bucketing step as above on the remaining points (if any). With this simple modification, Hn in probability. We also show that the expected number of nodes in the random hash tree and random pebbled hash tree is asymptotic to 2.3020238 ...n and 1.4183342 ...n, respectively.