Results 1  10
of
30
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
Abstract

Cited by 75 (9 self)
 Add to MetaCart
We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiplechoice processes in the heavily loaded case were obtained using majorization by the singlechoice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiplechoice processes are fundamentally different from the singlechoice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiplechoice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the singlechoice process. In particular, we investigate the allocation obtained by two different multiplechoice allocation schemes,
Comparing Random Data Allocation and Data Striping in Multimedia Servers
 In ACM SIGMETRICS
, 2000
"... We compare performance of the RIO (Randomized I/O) Multimedia Storage Server which is based on random data allocation and block replication with traditional data striping techniques. We compare both approaches in terms of maximum supported data rate and stream cost. Data striping techniques in multi ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
(Show Context)
We compare performance of the RIO (Randomized I/O) Multimedia Storage Server which is based on random data allocation and block replication with traditional data striping techniques. We compare both approaches in terms of maximum supported data rate and stream cost. Data striping techniques in multimedia servers are often designed for restricted workloads. e.g. sequential access patterns with CBR (constant bit rate) requirements. On other hand, RIO is designed to support virtually any type of multimedia application, including VBR (variable bit rate) video or audio, and interactive applications with unpredictable access patterns, such as 3D interactive virtual worlds, interactive scientific visualizations, etc. Surprisingly, our results show that system performance with random data allocation is competitive and sometimes even better than with data striping techniques, for the workloads for which data striping is designed to work best; i.e. streams with sequential access patterns and CBR...
Fast Concurrent Access to Parallel Disks
"... High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is ..."
Abstract

Cited by 62 (12 self)
 Add to MetaCart
High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for the efficient adaptation of singledisk external memory algorithms to multiple disks. We solve this problem for arbitrary access patterns by randomly mapping blocks of a logical address space to the disks. We show that a shared buffer of O(D) blocks suffices to support efficient writing. The analysis uses the properties of negative association to handle dependencies between the random variables involved. This approach might be of independent interest for probabilistic analysis in general. If two randomly allocated copies of each block exist, N arbitrary blocks can be read within dN=De + 1 I/O steps with high probability. The redundancy can be further reduced from 2 to 1 + 1=r for any integer r without a big impact on reading efficiency. From the point of view of external memory models, these results rehabilitate Aggarwal and Vitter's "singledisk multihead" model [1] that allows access to D arbitrary blocks in each I/O step. This powerful model can be emulated on the physically more realistic independent disk model [2] with small constant overhead factors. Parallel disk external memory algorithms can therefore be developed in the multihead model first. The emulation result can then be applied directly or further refinements can be added.
Pushtopeer videoondemand system: Design and evaluation
 In UMass Computer Science Techincal Report 2006–59
, 2006
"... Number: CRPRL2006110001 ..."
(Show Context)
Performance Analysis of the RIO Multimedia Storage System with Heterogeneous Disk Configurations
 In ACM Multimedia Conference
, 1998
"... RIO is a multimedia object server which manages a set of parallel disks and supports realtime data delivery with statistical delay guarantees. RIO uses random data allocation on disks combined with partial replication to achieve load balance and high performance. In this paper we analyze the perfor ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
(Show Context)
RIO is a multimedia object server which manages a set of parallel disks and supports realtime data delivery with statistical delay guarantees. RIO uses random data allocation on disks combined with partial replication to achieve load balance and high performance. In this paper we analyze the performance of RIO when the set of disks used to store data blocks is not homogeneous, having both different bandwidths and different storage capacities. The basic problem to be addressed for heterogeneous configurations is that, on average, the fraction of the load directed to each disk is proportional to the amount of data stored on it, which may not be proportional to the disk bandwidth. This may cause some disks to be overloaded, with long queues and delays, even though bandwidth is available on other disks. This reduces the system throughput or increases the delay bound that can be guaranteed. This problem arises whenever the bandwidth to space ratio (BSR) is not uniform across all disks. In ...
Quality of Service Support for Realtime Storage Systems
, 2003
"... The performance and capacity of commodity computer systems have improved drastically in recent years. However, these systems still lack the support for realtime data access, which is required by an increasing number of emerging applications. In this paper we first present several important storageb ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
The performance and capacity of commodity computer systems have improved drastically in recent years. However, these systems still lack the support for realtime data access, which is required by an increasing number of emerging applications. In this paper we first present several important storagebound realtime applications and classify their Quality of Service (QoS) requirements. We then survey the representative work on disk management in the areas of IO scheduling, admission control, and data placement. Finally, we present our approach for providing disk QoS in commodity systems and present key empirical results from the microbenchmarkbased evaluation of our QoSenhanced Linux kernel.
Perfectly balanced allocation
 in Proceedings of the 7th International Workshop on Randomization and Approximation Techniques in Computer Science, Princeton, NJ, 2003, Lecture Notes in Comput. Sci. 2764
, 2003
"... Abstract. We investigate randomized processes underlying load balancing based on the multiplechoice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previousl ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We investigate randomized processes underlying load balancing based on the multiplechoice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previously, it was known that a simple process that places the balls one by one in the least loaded bin can achieve a maximum load of m/n + Θ(log log n) with high probability. Furthermore, it was known that it is possible to achieve (with high probability) a maximum load of at most ⌈m/n ⌉ +1using maximum flow computations. In this paper, we extend these results in several aspects. First of all, we show that if m ≥ cn log n for some sufficiently large c, thenaperfect distribution of balls among the bins can be achieved (i.e., the maximum load is ⌈m/n⌉) with high probability. The bound for m is essentially optimal, because it is known that if m ≤ c ′ n log n for some sufficiently small constant c ′ , the best possible maximum load that can be achieved is ⌈m/n ⌉ +1with high probability. Next, we analyze a simple, randomized load balancing process based on a local search paradigm. Our first result here is that this process always converges to a best possible load distribution. Then, we study the convergence speed of the process. We show that if m is sufficiently large compared to n,thenno matter with which ball distribution the system starts, if the imbalance is ∆, then the process needs only ∆·n O(1) steps to reach a perfect distribution, with high probability. We also prove a similar result for m ≈ n, and show that if m = O(n log n / log log n), then an optimal load distribution (which has the maximum load of ⌈m/n ⌉ +1) is reached by the random process after a polynomial number of steps, with high probability.
Reconciling Simplicity and Realism in Parallel Disk Models
 Parallel Computing
, 2001
"... For the design and analysis of algorithms that process huge data sets, a machine model is needed that handles parallel disks. There seems to be a dilemma between simple and flexible use of such a model and accurate modelling of details of the hardware. This paper explains how many aspects of this pr ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
For the design and analysis of algorithms that process huge data sets, a machine model is needed that handles parallel disks. There seems to be a dilemma between simple and flexible use of such a model and accurate modelling of details of the hardware. This paper explains how many aspects of this problem can be resolved. The programming model implements one large logical disk allowing concurrent access to arbitrary sets of variable size blocks. This model can be implemented efficienctly on multiple independent disks even if zones with different speed, communication bottlenecks and failed disks are allowed. These results not only provide useful algorithmic tools but also imply a theoretical justification for studying external memory algorithms using simple abstract models.
Asynchronous scheduling of redundant disk arrays
, 2000
"... Random redundant allocation of data to parallel disk arrays can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be deferred until a block access is actually s ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Random redundant allocation of data to parallel disk arrays can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be deferred until a block access is actually started on a disk. These algorithms are also generalized for coding schemes with low redundancy. Using extensive experiments, practically important quantities are measured which have so far eluded an analytical treatment: The delay distribution when a stream of requests approaches the limit of the sytem capacity, the system efficiency for parallel disk applications with bounded prefetching buffers, and the combination of both for mixed traffic. A further step towards practice is taken by outlining the system design for α: automatically loadbalanced parallel harddisk array. 1