Results 1  10
of
19
Fast Crash Recovery in RAMCloud
 In Proc. of SOSP’11
, 2011
"... RAMCloud is a DRAMbased storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruc ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
RAMCloud is a DRAMbased storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruct lost data. The system uses a logstructured approach for all its data, in DRAM as well as on disk; this provides high performance both during normal operation and during recovery. RAMCloud employs randomized techniques to manage the system in a scalable and decentralized fashion. In a 60node cluster, RAMCloud recovers 35 GB of data from a failed server in 1.6 seconds. Our measurements suggest that the approach will scale to recover larger memory sizes (64 GB or more) in less time with larger clusters.
The natural workstealing algorithm is stable
 In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatoralloca ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatorallocation functions. During each timestep of our process, a generatorallocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unittime task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the workgeneration model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural workstealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Cuckoo hashing: Further analysis
, 2003
"... We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained f ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for various probabilities and constraints. The analysis rests on simple properties of branching processes.
Allocating Weighted Jobs in Parallel
, 1997
"... It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n if a ball is allocated in one (suitably chosen) of two (i.u.r.) bins. Stemann ([Ste96]) shows that r communication rounds suffice to guarantee a maximum load of maxf r p log n; O \Gamma m n \Delta g, with high probability. Adler et al. have shown in [ACMR95] that Stemanns protocol is optimal for constant r. In this paper we extend the above results in two directions: We generalize the lower bound to arbitrary r log log n. This implies that the result of Stemanns protocol is optimal for all r. Our main result is a generalization of Stemanns upper bound to weighted jobs: Let W A (W M ) denote the average (maximum) weight of the balls. Further let \Delta = W A =W M . Note that...
Parallel Continuous Randomized Load Balancing (Extended Abstract)
 In Proceedings of the Tenth ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... ) Petra Berenbrink Department of Mathematics and Computer Science Paderborn University, Germany Email: pebe@unipaderborn.de Tom Friedetzky and Ernst W. Mayr y Institut fur Informatik Technische Universitat Munchen, Germany Email: (friedetzmayr)@informatik.tumuenchen.de Abstract Recently, ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
) Petra Berenbrink Department of Mathematics and Computer Science Paderborn University, Germany Email: pebe@unipaderborn.de Tom Friedetzky and Ernst W. Mayr y Institut fur Informatik Technische Universitat Munchen, Germany Email: (friedetzmayr)@informatik.tumuenchen.de Abstract Recently, the subject of allocating tasks to servers has attracted much attention. There are several ways of distinguishing load balancing problems. There are sequential and parallel strategies, that is, placing the tasks one after the other or all of them in parallel. Another approach divides load balancing problems into continuous and static ones. In the continuous case new tasks are generated and consumed as time proceeds, in the second case the number of tasks is fixed. We present and analyze a parallel randomized continuous load balancing algorithm in a scenario where n processors continuously generate and consume tasks according to some given probability distribution. Each processor initiates l...
Performance, scalability, and semantics of concurrent FIFO queues
, 2011
"... We introduce the notion of a kFIFO queue which may dequeue elements out of FIFO order up to a constant k ≥ 0. Retrieving the oldest element from the queue may require up to k + 1 dequeue operations (bounded fairness), which may return elements not younger than the k + 1 oldest elements in the queu ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We introduce the notion of a kFIFO queue which may dequeue elements out of FIFO order up to a constant k ≥ 0. Retrieving the oldest element from the queue may require up to k + 1 dequeue operations (bounded fairness), which may return elements not younger than the k + 1 oldest elements in the queue (bounded age) or nothing even if there are elements in the queue. A kFIFO queue is starvationfree for finite k where k + 1 is what we call the worstcase semantical deviation (WCSD) of the queue from a regular FIFO queue. The WCSD bounds the actual semantical deviation (ASD) of a kFIFO queue from a regular FIFO queue when applied to a given workload. Intuitively, the ASD keeps track of the number of dequeue operations necessary to return oldest elements and the age of dequeued elements. We show that a number of existing concurrent algorithms implement kFIFO queues whose WCSD are determined by configurable constants independent from any workload. We then introduce socalled Scal queues, which implement kFIFO queues with generally larger, workloaddependent as well as unbounded WCSD. Since ASD cannot be obtained without prohibitive overhead we have developed a tool that computes lower bounds on ASD from timestamped runs. Our micro and macrobenchmarks on a stateoftheart 40core multiprocessor machine show that Scal queues, as an immediate consequence of their weaker WCSD, outperform and outscale existing implementations at the expense of moderately increased lower bounds on ASD.
On powerofchoice in downlink transmission scheduling
 Inform. Theory and Applicat. Workshop
, 2008
"... Abstract — A lowcomplexity guiding principle is considered for transmission scheduling from n homogeneous queues whose channel states fluctuate independently. The scheduler transmits from a longest queue within d randomly chosen queues with eligible channel states. A Markovian model is studied wher ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract — A lowcomplexity guiding principle is considered for transmission scheduling from n homogeneous queues whose channel states fluctuate independently. The scheduler transmits from a longest queue within d randomly chosen queues with eligible channel states. A Markovian model is studied where mean packet transmission time is n −1 and packet arrival rate is λ < 1 per queue. Equilibrium distribution of queue occupancy is obtained in the limit as n → ∞ and it is shown to have tails that decay as Θ((λ/d) k). If transmissions are scheduled from a longest eligible queue in the entire system then almost all queues are empty in equilibrium; the number of queues with one packet is Θ(1) and the number of queues with more than one packet is o(1) as n → ∞. Equilibrium distribution of the total number of packets in the system is also characterized in this latter case. I.
ScalA: NonLinearizable Computing Breaks The Scalability Barrier
, 2010
"... We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for e ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for example, is linearizable if the effect of push and pop operations on the stack always occurs instantaneously. A linearizable stack guarantees that pop operations return the youngest stack elements first, i.e., the elements in the reverse order in which the operations that pushed them onto the stack took effect. Linearizability allows to reorder concurrent (but not sequential) operations arbitrarily. We relax linearizability to klinearizability with k> 0 to also allow sequences of up to k−1 sequential operations to be reordered arbitrarily and thus execute concurrently. With a klinearizable stack, for example, a pop operation may not return the youngest but the kth youngest element on the stack. It turns out that klinearizability may be tolerated by concurrent applications such as process schedulers and web servers that already use it implicitly. Moreover, klinearizability does provide positive scalability in some cases because more operations may be executed concurrently but may still be too restrictive under high contention. We therefore propose a set of load balancing algorithms, which significantly improve scalability by approximating klinearizability probabilistically. We introduce Scal, an opensource framework for implementing klinearizable approximations of concurrent data structures, and show in multiple benchmarks that Scal provides positive scalability for concurrent data structures that typically do not scale under high contention.
Simple Competitive Request Scheduling Strategies
 in 11th ACM Symposium on Parallel Architectures and Algorithms
, 1999
"... In this paper we study the problem of scheduling realtime requests in distributed data servers. We assume the time to be divided into time steps of equal length called rounds. During every round a set of requests arrives at the system, and every resource is able to fulfill one request per round. Ev ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we study the problem of scheduling realtime requests in distributed data servers. We assume the time to be divided into time steps of equal length called rounds. During every round a set of requests arrives at the system, and every resource is able to fulfill one request per round. Every request specifies two (distinct) resources and requires to get access to one of them. Furthermore, every request has a deadline of d, i.e. a request that arrives in round t has to be fulfilled during round t +d 1 at the latest. The number of requests which arrive during some round and the two alternative resources of every request are selected by an adversary. The goal is to maximize the number of requests that are fulfilled before their deadlines expire. We examine the scheduling problem in an online setting, i.e. new requests continuously arrive at the system, and we have to determine online an assignment of the requests to the resources in such a way that every resource has to fulfil...
Internet Routing and Internet Service Provision
, 2009
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific