Results 1 - 10
of
179
Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining
- ACM Transactions on Computer Systems
, 2001
"... this paper, we describe a new information management service called Astrolabe. Astrolabe monitors the dynamically changing state of a collection of distributed resources, reporting summaries of this information to its users. Like DNS, Astrolabe organizes the resources into a hierarchy of domains, wh ..."
Abstract
-
Cited by 288 (16 self)
- Add to MetaCart
this paper, we describe a new information management service called Astrolabe. Astrolabe monitors the dynamically changing state of a collection of distributed resources, reporting summaries of this information to its users. Like DNS, Astrolabe organizes the resources into a hierarchy of domains, which we call zones to avoid confusion, and associates attributes with each zone. Unlike DNS, zones are not bound to specific servers, the attributes may be highly dynamic, and updates propagate quickly; typically, in tens of seconds
A Low-bandwidth Network File System
, 2001
"... This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client's ca ..."
Abstract
-
Cited by 240 (3 self)
- Add to MetaCart
This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client's cache. Using this technique, LBFS achieves up to two orders of magnitude reduction in bandwidth utilization on common workloads, compared to traditional network file systems
OpenDHT: A Public DHT Service and Its Uses
, 2005
"... Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a public DHT service called OpenDHT. Designing a DHT that can be widely shared, both among mutually untrusting clients and amo ..."
Abstract
-
Cited by 159 (16 self)
- Add to MetaCart
Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a public DHT service called OpenDHT. Designing a DHT that can be widely shared, both among mutually untrusting clients and among a variety of applications, poses two distinct challenges. First, there must be adequate control over storage allocation so that greedy or malicious clients do not use more than their fair share. Second, the interface to the DHT should make it easy to write simple clients, yet be sufficiently general to meet a broad spectrum of application requirements. In this paper we describe our solutions to these design challenges. We also report our early deployment experience with OpenDHT and describe the variety of applications already using the system.
Design and Evaluation of a Continuous Consistency Model for Replicated Services
, 2000
"... The tradeoffs between consistency, performance, and availability are well understood. Traditionally, however, designers of replicated systems have been forced to choose from either strong consistency guarantees or none at all. This paper explores the semantic space between traditional strong and opt ..."
Abstract
-
Cited by 145 (12 self)
- Add to MetaCart
The tradeoffs between consistency, performance, and availability are well understood. Traditionally, however, designers of replicated systems have been forced to choose from either strong consistency guarantees or none at all. This paper explores the semantic space between traditional strong and optimistic consistency models for replicated services. We argue that an important class of applications can tolerate relaxed consistency, but benefit from bounding the maximum rate of inconsistent access in an application-specific manner. Thus, we develop a set of metrics, Numerical Error, Order Error, and Staleness, to capture the consistency spectrum. We then present the design and implementation of TACT, a middleware layer that enforces arbitrary consistency bounds among replicas using these metrics. Finally, we show that three replicated applications demonstrate significant semantic and performance benefits from using our framework.
Epidemic routing for partially-connected ad hoc networks
, 2000
"... Mobile ad hoc routing protocols allow nodes with wireless adaptors to communicate with one another without any pre-existing network infrastructure. Existing ad hoc routing protocols, while robust to rapidly changing network topology, assume the presence of a connected path from source to destination ..."
Abstract
-
Cited by 143 (0 self)
- Add to MetaCart
Mobile ad hoc routing protocols allow nodes with wireless adaptors to communicate with one another without any pre-existing network infrastructure. Existing ad hoc routing protocols, while robust to rapidly changing network topology, assume the presence of a connected path from source to destination. Given power limitations, the advent of short-range wireless networks, and the wide physical conditions over which ad hoc networks must be deployed, in some scenarios it is likely that this assumption is invalid. In this work, we develop techniques to deliver messages in the case where there is never a connected path from source to destination or when a network partition exists at the time a message is originated. To this end, we introduce Epidemic Routing, where random pair-wise exchanges of messages among mobile hosts ensure eventual message delivery. The goals of Epidemic Routing are to: i) maximize message delivery rate, ii) minimize message latency, and iii) minimize the total resources consumed in message delivery. Through an implementation in the Monarch simulator, we show that Epidemic Routing achieves eventual delivery of 100 % of messages with reasonable aggregate resource consumption in a number of interesting scenarios. 1
A Scalable Distributed Information Management System
- In Proc SIGCOMM
, 2003
"... We present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary ..."
Abstract
-
Cited by 127 (18 self)
- Add to MetaCart
We present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary views of global information. To serve as a basic building block, a SDIMS should have four properties: scalability to many nodes and attributes, flexibility to accommodate a broad range of applications, administrative isolation for security and availability, and robustness to node and network failures. We design, implement and evaluate a SDIMS that (1) leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, (2) provides flexibility through a simple API that lets applications control propagation of reads and writes, (3) provides administrative isolation through simple extensions to current DHT algorithms, and (4) achieves robustness to node and network reconfigurations through lazy reaggregation, on-demand reaggregation, and tunable spatial replication. Through extensive simulations and micro-benchmark experiments, we observe that our system is an order of magnitude more scalable than existing approaches, achieves isolation properties at the cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures.
Giggle: A Framework for Constructing Scalable Replica Location Services
, 2002
"... In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location ser ..."
Abstract
-
Cited by 122 (36 self)
- Add to MetaCart
In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.
Design and Evaluation of a Conit-Based Continuous Consistency Model for Replicated Services
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 2002
"... ..."
The IceCube approach to the reconciliation of divergent replicas
, 2001
"... We describe a novel approach to log-based reconciliation called IceCube. It is general and is parameterised by application and object semantics. IceCube considers more flexible orderings and is designed to ease the burden of reconciliation on the application programmers. IceCube captures the static ..."
Abstract
-
Cited by 77 (10 self)
- Add to MetaCart
We describe a novel approach to log-based reconciliation called IceCube. It is general and is parameterised by application and object semantics. IceCube considers more flexible orderings and is designed to ease the burden of reconciliation on the application programmers. IceCube captures the static and dynamic reconciliation constraints between all pairs of actions, proposes schedules that satisfy the static constraints, and validates them against the dynamic constraints. Preliminary experience indicates that strong static constraints
The Costs and Limits of Availability for Replicated Services
, 2001
"... As raw system and network performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service across multiple, wide-area sites. However, replicat ..."
Abstract
-
Cited by 77 (10 self)
- Add to MetaCart
As raw system and network performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service across multiple, wide-area sites. However, replication introduces well-known tradeoffs between service consistency and availability. Thus, this paper explores the benefits of dynamically trading consistency for availability using a continuous consistency model. In this model, applications specify a maximum deviation from strong consistency on a per-replica basis. In this paper, we: i) evaluate availability of a prototype replication system running across the Internet as a function of consistency level, consistency protocol, and failure characteristics, ii) demonstrate that simple optimizations to existing consistency protocols result in significant availability improvements (more than an order of magnitude in some scenarios), iii) use our experience with these optimizations to prove tight upper bounds on the availability of services, and iv) show that maximizing availability typically entails remaining as close to strong consistency as possible during times of good connectivity, resulting in a communication versus availability tradeoff.

