Results 1 - 10
of
36
Competitive Algorithms for Distributed Data Management
- In Proceedings of the 24th Annual ACM Symposium on Theory of Computing
"... We deal with the competitive analysis of algorithms for managing data in a distributed environment. We deal with the file allocation problem ([DF], [ML]), where copies of a file may be be stored in the local storage of some subset of processors. Copies may be replicated and discarded over time so ..."
Abstract
-
Cited by 100 (8 self)
- Add to MetaCart
We deal with the competitive analysis of algorithms for managing data in a distributed environment. We deal with the file allocation problem ([DF], [ML]), where copies of a file may be be stored in the local storage of some subset of processors. Copies may be replicated and discarded over time so as to optimize communication costs, but multiple copies must be kept consistent and at least one copy must be stored somewhere in the network at all times. We deal with competitive algorithms for minimizing communication costs, over arbitrary sequences of reads and writes, and arbitrary network topologies. We define the constrained file allocation problem to be the solution of many individual file allocation problems simultaneously, subject to the constraints of local memory size. We give competitive algorithms for this problem on the uniform network topology. We then introduce distributed competitive algorithms for on-line data tracking (a generalization of mobile user tracking [AP1...
Data Placement in Bubba
- PROCEEDINGS OF THE ACM-SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1988
"... Thus paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications bemg developed at MCC “Highly-parallel” lmplres that load balancmng IS a cntlcal performance issue “Data-mtenave” means data IS so large that operatrons should be executed where the d ..."
Abstract
-
Cited by 96 (0 self)
- Add to MetaCart
Thus paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications bemg developed at MCC “Highly-parallel” lmplres that load balancmng IS a cntlcal performance issue “Data-mtenave” means data IS so large that operatrons should be executed where the data resides As a result, data placement becomes a cntlcal performance issue In general, determmmg the optimal placement of data across processmg nodes for performance IS a difficult problem We describe our heuristic approach to solvmg the data placement problem w Bubba We then present expenmental results using a specific workload to provide msrght into the problem Several researchers have argued the benefits of deelustering (1 e, spreading each base relation over many nodes) We show that as declustermg IS increased. load balancing contmues to improve However, for transactions mvolvmg complex Joins, further declusterrng reduces throughput because of communications, startup and termmatron overhead We argue that data placement, especially declustermg, m a highly-parallel system must be considered early in the design, so that mechanrsms can be included for supportmg variable declustermg, for mmtmlzmg the most significant overheads associated with large-scale declustenng, and for gathering the required statistics.
Peer-to-Peer Data Trading to Preserve Information
- ACM Transactions on Information Systems
"... Data archiving systems rely on replication to preserve information. This paper discusses how a network of autonomousarchiving sites can trade data to achieve the most reliable replication. A series of binary trades among sites produces a peer-to-peer archiving network. Two trading algorithms are e ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
Data archiving systems rely on replication to preserve information. This paper discusses how a network of autonomousarchiving sites can trade data to achieve the most reliable replication. A series of binary trades among sites produces a peer-to-peer archiving network. Two trading algorithms are examined, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections.) The concept of deeds is introduced; deeds track the blocks of space owned by one site at another. Policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades, are discussed. Finally, simulation results are presented that reveal which policies are best. The experiments indicate that a digital archive can achieve the best reliability by trading blocks of space (deeds), and that following certain policies will allow that site to maximize its reliability. Categories and Subject Descriptors: H.3.7 [Information storage and retrieval]: Digital libraries --- systems issues; E.5 [Files]: Backup/recovery General Terms: Design, reliability Additional Key Words and Phrases: data replication, fault tolerance, digital archiving, digital library, resource negotiation 1
Design and Evaluation of Data Allocation Algorithms for Distributed Multimedia Database Systems
- IEEE Journal on Selected areas in Communication
, 1996
"... Given a distributed multimedia database system and a set of queries as well as their frequencies from each site, the objective of a data allocation algorithm is to locate the multimedia data objects (MDOs) at different sites so as to minimize the total data transfer cost incurred in executing the qu ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Given a distributed multimedia database system and a set of queries as well as their frequencies from each site, the objective of a data allocation algorithm is to locate the multimedia data objects (MDOs) at different sites so as to minimize the total data transfer cost incurred in executing the queries. The data allocation problem, however, is NP-complete, and thus requires fast heuristics to generate efficient solutions. In this paper we propose three data allocation algorithms which are based on a genetic technique, an evolutionary process, and neural networks. We have implemented and evaluated these algorithms on our distributed multimedia database system test-bed. A comparison of the algorithms reveals trade-offs between their solution quality and time-complexity. 1
The Application of Microeconomics to the Design of Resource Allocation and Control Algorithms
, 1989
"... In this thesis, we present a new methodology for resource sharing algorithms in distributed systems. We propose that a distributed computing system should be composed of a decentralized community of microeconomic agents. We show that this approach decreases complexity and can substantially improve ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
In this thesis, we present a new methodology for resource sharing algorithms in distributed systems. We propose that a distributed computing system should be composed of a decentralized community of microeconomic agents. We show that this approach decreases complexity and can substantially improve performance. We compare the performance, generality and complexity of our algorithms with non-economic algorithms. To validate the usefulness of our approach, we present economies that solve three distinct resource management problems encountered in large, distributed systems. The first economy performs CPU load balancing and demonstrates how our approach limits complexity and effectively allocates resources when compared to non-economic algorithms. We show that the economy achieves better performance than a representative non-economic algorithm. The load balancing economy spa...
Static and adaptive data replication algorithms for fast information access in large distributed systems
- IEEE International Conference on Distributed Computing Systems
, 2000
"... Creating replicas of frequently accessed objects across a read-intensive network can result in large bandwidth savings which, in turn, can lead to reduction in user response time. On the contrary, data replication in the presence of writes incurs extra cost due to multiple updates. The set of sites ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Creating replicas of frequently accessed objects across a read-intensive network can result in large bandwidth savings which, in turn, can lead to reduction in user response time. On the contrary, data replication in the presence of writes incurs extra cost due to multiple updates. The set of sites at which an object is replicated constitutes its replication scheme. Finding an optimal replication scheme that minimizes the amount of network traffic, given read and write frequencies for various objects, is NP-complete in general. We propose two heuristics to deal with this problem for static read and write patterns. The first is a simple and fast greedy heuristic that yields good solutions when the system is predominantly read-oriented. The second is a genetic algorithm that through an efficient exploration of the solution space provides better solutions for cases where the greedy heuristic does not perform well. We also propose an extended genetic algorithm that rapidly adapts to the dynamically changing characteristics such as the frequency of reads and writes for particular objects. 1
Creating Trading Networks of Digital Archives
- In Proc. 1st Joint ACM/IEEE Conference on Digital Libraries (JCDL
, 2001
"... Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, su ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, such as the amount of storage space to provide and the best number of partner sites. We also deal with the fact that some sites may be more reliable than others. Experimental results from a data trading simulator illustrate which policies are most reliable. Our techniques focus on preserving the "bits" of digital collections; other services that focus on other archiving concerns (such as preserving meaningful metadata) can be built on top of the system we describe here.
Static and adaptive distributed data replication using genetic algorithms
, 2004
"... Fast dissemination and access of information in large distributed systems, such as the Internet, has become a norm of our daily life. However, undesired long delays experienced by end-users, especially during the peak hours, continue to be a common problem. Replicating some of the objects at multipl ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Fast dissemination and access of information in large distributed systems, such as the Internet, has become a norm of our daily life. However, undesired long delays experienced by end-users, especially during the peak hours, continue to be a common problem. Replicating some of the objects at multiple sites is one possible solution in decreasing network traffic. The decision of what to replicate where, requires solving a constraint optimization problem which is NP-complete in general. Such problems are known to stretch the capacity of a Genetic Algorithm (GA) to its limits. Nevertheless, we propose a GA to solve the problem when the read/write demands remain static and experimentally prove the superior solution quality obtained compared to an intuitive greedy method. Unfortunately, the static GA approach involves high running time and may not be useful when read/write demands continuously change, as is the case with breaking news. To tackle such case we propose a hybrid GA that takes as input the current replica distribution and computes a new one using knowledge about the network attributes and the changes occurred. Keeping in view more pragmatic scenarios in today’s distributed information environments, we evaluate these algorithms with respect to the storage capacity constraint of each site as well as variations in the popularity of objects, and also examine the trade-off between running time and solution quality.
An Overview of Data Replication on the Internet
- In Proc. of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN
, 2002
"... The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by end-users has become the primary performance objecti ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by end-users has become the primary performance objective compared to more traditional issues, such as server utilization. The two promising techniques to improve the Internet responsiveness are caching and replication. In this paper we present an overview of recent research in replication. We begin by arguing on the important role of replication in decreasing client perceived response time and proceed by illustrating the main topics that affect its successful deployment on the Internet. We analyze and characterize existing research, providing taxonomies and classifications whenever possible. Our discussion reveals several open problems and research directions. 1
Distributed File Allocation with Consistency Constraints
- in Proceedings of the International Conference on Distributed Computing Systems
, 1992
"... We consider the resource allocation problem in distributed computing systems that have strict mutual consistency requirements. Our model incorporates the behavior of consistency control algorithms, which ensure that mutual consistency of replicated data is preserved even when communication links of ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We consider the resource allocation problem in distributed computing systems that have strict mutual consistency requirements. Our model incorporates the behavior of consistency control algorithms, which ensure that mutual consistency of replicated data is preserved even when communication links of the computer network and/or computers on which the files reside fail. The problem of resource allocation in these networks is significant in terms of the efficiency of operations and the reliability of the network. The constrained resource allocation problem is formulated as a mixed nonlinear integer program. An efficient algorithm is proposed to solve this problem. The performance of the algorithm is evaluated in terms of the algorithm's accuracy, efficiency and execution times, using a representative problem set. 1 Introduction Consider a distributed computing system (DCS) that is made up of a set of sites (nodes) connected through communication links which transmit information from one s...

