Results 1 - 10
of
111
An Adaptive Data Replication Algorithm
- ACM Transactions on Database Systems
, 1997
"... This paper addresses the performance of distributed database systems. Specifically, we present an algorithm for dynamic replication of an object in distributed systems. The algorithm is adaptive in the sense that it changes the replication scheme of the object (i.e. the set of processors at which th ..."
Abstract
-
Cited by 146 (0 self)
- Add to MetaCart
This paper addresses the performance of distributed database systems. Specifically, we present an algorithm for dynamic replication of an object in distributed systems. The algorithm is adaptive in the sense that it changes the replication scheme of the object (i.e. the set of processors at which the object is replicated), as changes occur in the read-write pattern of the object (i.e. the number of reads and writes issued by each processor). The algorithm continuously moves the replication scheme towards an optimal one. We show that the algorithm can be combined with the concurrency control and recovery mechanisms of a distributed database management system. The performance of the algorithm is analyzed theoretically and experimentally. On the way we provide a lower bound on the performance of any dynamic replication algorithm.
Hippodrome: Running Circles around Storage Administration
- In Proceedings of the Conference on File and Storage Technologies
, 2002
"... Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by admi ..."
Abstract
-
Cited by 118 (9 self)
- Add to MetaCart
Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by administrators who are guided by rules of thumb. Unfortunately, this process involves trial and error, and as a result is tedious and error-prone. In this paper, we introduce Hippodrome, an approach to automating initial system configuration. Hippodrome is an iterative loop that analyzes an existing system to determine its requirements, creates a new storage system design to better meet these requirements, and migrates the existing system to the new design. In this paper, we show how Hippodrome automates initial system configuration. 1
Minerva: an automated resource provisioning tool for large-scale storage systems
- ACM Transactions on Computer Systems
, 2001
"... Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system d ..."
Abstract
-
Cited by 103 (24 self)
- Add to MetaCart
Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly overprovisioned, substantially under-performing or, in the worst case, both. To solve the configuration nightmare, we present MINERVA: a suite of tools for designing storage systems automatically. MINERVA uses declarative specifications of application requirements and device capabilities; constraint-based formulations of the various subproblems; and optimization techniques to explore the search space of possible solutions. This paper also explores and evaluates the design decisions that went into MINERVA, using specialized micro and macro-benchmarks. We show that MINERVA can successfully handle a workload with substantial complexity (a decision-support database benchmark). MINERVA created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, MINERVA was able to predict the resulting system's performance before it was built.
Competitive Algorithms for Distributed Data Management
- In Proceedings of the 24th Annual ACM Symposium on Theory of Computing
"... We deal with the competitive analysis of algorithms for managing data in a distributed environment. We deal with the file allocation problem ([DF], [ML]), where copies of a file may be be stored in the local storage of some subset of processors. Copies may be replicated and discarded over time so ..."
Abstract
-
Cited by 100 (8 self)
- Add to MetaCart
We deal with the competitive analysis of algorithms for managing data in a distributed environment. We deal with the file allocation problem ([DF], [ML]), where copies of a file may be be stored in the local storage of some subset of processors. Copies may be replicated and discarded over time so as to optimize communication costs, but multiple copies must be kept consistent and at least one copy must be stored somewhere in the network at all times. We deal with competitive algorithms for minimizing communication costs, over arbitrary sequences of reads and writes, and arbitrary network topologies. We define the constrained file allocation problem to be the solution of many individual file allocation problems simultaneously, subject to the constraints of local memory size. We give competitive algorithms for this problem on the uniform network topology. We then introduce distributed competitive algorithms for on-line data tracking (a generalization of mobile user tracking [AP1...
Competitive Distributed File Allocation
, 1993
"... This paper deals with the file allocation problem [BFR92] concerning the dynamic optimization of communication costs to access data in a distributed environment. We develop a dynamic file re-allocation strategy that adapts on-line to a sequence of read and write requests whose location and relative ..."
Abstract
-
Cited by 99 (12 self)
- Add to MetaCart
This paper deals with the file allocation problem [BFR92] concerning the dynamic optimization of communication costs to access data in a distributed environment. We develop a dynamic file re-allocation strategy that adapts on-line to a sequence of read and write requests whose location and relative frequencies are completely unpredictable. This is achieved by replicating the file in response to read requests and migrating the file in response to write requests while paying the associated communications costs, so as to be closer to processors that access it frequently. We develop first explicit deterministic on-line strategy assuming existence of global information about the state of the network; previous (deterministic) solutions were complicated and more expensive. Our solution has (optimal) logarithmic competitive ratio. The paper also contains the first explicit deterministic data migration [BS89] algorithm achieving the best known competitive ratio for this problem. Using somewhat ...
Locating Objects in Mobile Computing
, 2001
"... In current distributed systems, the notion of mobility is emerging in many forms and applications. ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
In current distributed systems, the notion of mobility is emerging in many forms and applications.
Data Replication for Mobile Computers
, 1994
"... Users of mobile computers will soon have online access to a large number of databases via wireless networks. Because of limited bandwidth, wireless communication is more expensive than wire communication. In this paper we present and analyze various static and dynamic data allocation methods. The ob ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
Users of mobile computers will soon have online access to a large number of databases via wireless networks. Because of limited bandwidth, wireless communication is more expensive than wire communication. In this paper we present and analyze various static and dynamic data allocation methods. The objective is to optimize the communication cost between a mobile computer and the stationary computer that stores the online database. Analysis is performed in two cost models. One is connection (or time) based, as in cellular telephones, where the user is charged per minute of connection. The other is message based, as in packet radio networks, where the user is charged per message. Our analysis addresses both, the average case and the worst case for determining the best allocation method. 0 1 Introduction Users of mobile computers, such as palmtops, notebook computers and personal communication systems, will soon have online access to a large number of databases via wireless networks. The ...
Distributed Paging for General Networks
, 1996
"... Distributed paging [BFR92, ABF93b, AK95] deals with the dynamic allocation of copies of files in a distributed network as to minimize the total communication cost over a sequence of read and write requests. Most previous work deals with the file allocation problem [BS89, West91, CLRW93, ABF93a, ..."
Abstract
-
Cited by 55 (5 self)
- Add to MetaCart
Distributed paging [BFR92, ABF93b, AK95] deals with the dynamic allocation of copies of files in a distributed network as to minimize the total communication cost over a sequence of read and write requests. Most previous work deals with the file allocation problem [BS89, West91, CLRW93, ABF93a, WY93, Koga93, AK94, LRWY94] where infinite nodal memory capacity is assumed. In contrast the distributed paging problem makes the more realistic assumption that nodal memory capacity is limited. Former work on distributed paging deals with the problem only in the case of a uniform network topology. This paper gives the first distributed paging algorithm for general networks. The algorithm is competitive in storage and communication. The competitive ratios are poly-logarithmic in the total number of network nodes and the diameter of the network. Johns Hopkins University and Lab. for Computer Science, MIT. Supported by Air Force Contract TNDGAFOSR-86-0078, ARO contract DAAL03-86-K-0171, NSF contract 9114440-CCR, DARPA contract N00014J -92-1799, and a special grant from IBM. E-Mail: baruch@theory.lcs.mit.edu. y Department of Computer Science, School of Mathematics, Tel-Aviv University, Tel-Aviv 69978, Israel. Supported by a grant from the Israeli Academy of Sciences. E-mail: yairb@math.tau.ac.il, fiat@math.tau.ac.il 0 1
Data Partitioning and Load Balancing in Parallel Disk Systems
, 1994
"... Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to ..."
Abstract
-
Cited by 54 (8 self)
- Add to MetaCart
Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent file system that optimizes striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Approximation Algorithms for Data Placement in Arbitrary Networks
- in Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2001
"... Abstract We develop approximation algorithms for the problem of placing replicated data in arbitrary net-works, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average data-access cost. We introduce the data placement problem ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
Abstract We develop approximation algorithms for the problem of placing replicated data in arbitrary net-works, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average data-access cost. We introduce the data placement problem tomodel this problem. We have a set of caches F, a set of clients D, and a set of data objects O. Each cache i can store at most ui data objects. Each client j 2 D has demand dj for a specific data object o(j) 2 O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs astorage cost of f oi, and assigning client j to cache i incurs an access cost of djcij. The goal is to find aplacement of the data objects to caches respecting the capacity constraints, and an assignment of clients

