Results 11 - 20
of
266
Experiences building planetlab
- In Proceedings of the 7th USENIX Symp. on Operating Systems Design and Implementation (OSDI
, 2006
"... Abstract. This paper reports our experiences building PlanetLab over the last four years. It identifies the requirements that shaped PlanetLab, explains the design decisions that resulted from resolving conflicts among these requirements, and reports our experience implementing and supporting the sy ..."
Abstract
-
Cited by 90 (11 self)
- Add to MetaCart
(Show Context)
Abstract. This paper reports our experiences building PlanetLab over the last four years. It identifies the requirements that shaped PlanetLab, explains the design decisions that resulted from resolving conflicts among these requirements, and reports our experience implementing and supporting the system. Due in large part to the nature of the “PlanetLab experiment, ” the discussion focuses on synthesis rather than new techniques, balancing system-wide considerations rather than improving performance along a single dimension, and learning from feedback from a live system rather than controlled experiments using synthetic workloads. 1
Minimizing churn in distributed systems
, 2006
"... A pervasive requirement of distributed systems is to deal with churn — change in the set of participating nodes due to joins, graceful leaves, and failures. A high churn rate can increase costs or decrease service quality. This paper studies how to reduce churn by selecting which subset of a set of ..."
Abstract
-
Cited by 80 (3 self)
- Add to MetaCart
(Show Context)
A pervasive requirement of distributed systems is to deal with churn — change in the set of participating nodes due to joins, graceful leaves, and failures. A high churn rate can increase costs or decrease service quality. This paper studies how to reduce churn by selecting which subset of a set of available nodes to use. First, we provide a comparison of the performance of a range of different node selection strategies in five real-world traces. Among our findings is that the simple strategy of picking a uniform-random replacement whenever a node fails performs surprisingly well. We explain its performance through analysis in a stochastic model. Second, we show that a class of strategies, which we call “Preference List ” strategies, arise commonly as a result of optimizing for a metric other than churn, and produce high churn relative to more randomized strategies under realistic node failure patterns. Using this insight, we demonstrate and explain differences in performance for designs that incorporate varying degrees of randomization. We give examples from a variety of protocols, including anycast, overlay multicast, and distributed hash tables. In many cases, simply adding some randomization can go a long way towards reducing churn.
Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure
- in a Multi-Domain Infrastructure,’’ IEEE International Conference on Autonomic Computing
, 2006
"... A shared distributed infrastructure is formed by federating computation resources from multiple domains. Such shared infrastructures are increasing in popularity and are providing massive amounts of aggregated computation resources to large numbers of users. Meanwhile, virtualization technologies, a ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
(Show Context)
A shared distributed infrastructure is formed by federating computation resources from multiple domains. Such shared infrastructures are increasing in popularity and are providing massive amounts of aggregated computation resources to large numbers of users. Meanwhile, virtualization technologies, at machine and network levels, are maturing and enabling mutually isolated virtual computation environments for executing arbitrary parallel/distributed applications on top of such a shared physical infrastructure. In this paper, we go one step further by supporting autonomic adaptation of virtual computation environments as active, integrated entities. More specifically, driven by both dynamic availability of infrastructure resources and dynamic application resource demand, a virtual computation environment is able to automatically relocate itself across the infrastructure and scale its share of infrastructural resources. Such autonomic adaptation is transparent to both users of virtual environments and administrators of infrastructures, maintaining the look and feel of a stable, dedicated environment for the user. As our proofof-concept, we present the design, implementation, and evaluation of a system called VIOLIN, which is composed of a virtual network of virtual machines capable of live migration across a multi-domain physical infrastructure. 1
Exploiting Availability Prediction in Distributed Systems
, 2006
"... Loosely-coupled distributed systems have significant scale and cost advantages over more traditional architectures, but the availability of the nodes in these systems varies widely. Availability modeling is crucial for predicting per-machine resource burdens and understanding emergent, system-wide p ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
(Show Context)
Loosely-coupled distributed systems have significant scale and cost advantages over more traditional architectures, but the availability of the nodes in these systems varies widely. Availability modeling is crucial for predicting per-machine resource burdens and understanding emergent, system-wide phenomena. We present new techniques for predicting availability and test them using traces taken from three distributed systems. We then describe three applications of availability prediction. The first, availability-guided replica placement, reduces object copying in a distributed data store while increasing data availability. The second shows how availability prediction can improve routing in delay-tolerant networks. The third combines availability prediction with virus modeling to improve forecasts of global infection dynamics.
Octant: a comprehensive framework for the geolocalization of Internet hosts
- in Proc. 4th USENIX NSDI
, 2007
"... Determining the physical location of Internet hosts is a critical enabler for many new location-aware services. In this paper, we present Octant, a novel, comprehen-sive framework for determining the location of Internet hosts in the real world based solely on network mea-surements. The key insight ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
(Show Context)
Determining the physical location of Internet hosts is a critical enabler for many new location-aware services. In this paper, we present Octant, a novel, comprehen-sive framework for determining the location of Internet hosts in the real world based solely on network mea-surements. The key insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a sys-tem of constraints by deriving them aggressively from network measurements, and to solve the system geomet-rically to yield the estimated region in which the target resides. This approach gains its accuracy and precision by taking advantage of both positive and negative con-straints, that is, constraints on where the node can and cannot be, respectively. The constraints are represented using regions bounded by Bézier curves, allowing pre-cise constraint representation and low-cost geometric op-erations. The framework can reason in the presence of uncertainty, enabling it to gracefully cope with aggres-sively derived constraints that may contain errors. An evaluation of Octant using PlanetLab nodes and public traceroute servers shows that Octant can localize the me-dian node to within 22 mi., a factor of three better than other evaluated approaches. 1
Distributed resource discovery on PlanetLab with SWORD
- In WORLDS
, 2004
"... Large-scale distributed services such as content distribution networks, peer-to-peer storage, distributed games, and scientific applications, have recently received substantial interest from both researchers and industry. At ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
(Show Context)
Large-scale distributed services such as content distribution networks, peer-to-peer storage, distributed games, and scientific applications, have recently received substantial interest from both researchers and industry. At
Corona: A High Performance Publish-Subscribe System for the World Wide Web
- In NSDI
, 2006
"... Despite the abundance of frequently changing information, the Web lacks a publish-subscribe interface for delivering updates to clients. The use of naïve polling for detecting updates leads to poor performance and limited scalability as clients do not detect updates quickly and servers face high loa ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
(Show Context)
Despite the abundance of frequently changing information, the Web lacks a publish-subscribe interface for delivering updates to clients. The use of naïve polling for detecting updates leads to poor performance and limited scalability as clients do not detect updates quickly and servers face high loads imposed by active polling. This paper describes a novel publish-subscribe system for the Web called Corona, which provides high performance and scalability through optimal resource allocation. Users register interest in Web pages through existing instant messaging services. Corona monitors the subscribed Web pages, detects updates efficiently by allocating polling load among cooperating peers, and disseminates updates quickly to users. Allocation of resources for polling is driven by a distributed optimization engine that achieves the best update performance without exceeding load limits on content servers. Large-scale simulations and measurements from PlanetLab deployment demonstrate that Corona achieves orders of magnitude improvement in update performance at a modest cost. 1
Solaris Zones: Operating System Support for Consolidating Commercial Workloads
- In 18th Large Installation System Administration Conference
, 2004
"... Server consolidation, which allows multiple workloads to run on the same system, has become increasingly important as a way to improve the utilization of computing resources and reduce costs. Consolidation is common in mainframe environments, where technology to support running multiple workloads an ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
(Show Context)
Server consolidation, which allows multiple workloads to run on the same system, has become increasingly important as a way to improve the utilization of computing resources and reduce costs. Consolidation is common in mainframe environments, where technology to support running multiple workloads and even multiple operating systems on the same hardware has been evolving since the late 1960’s. This technology is now becoming an important differentiator in the UNIX and Linux server market as well, both at the low end (virtual web hosting) and high end (traditional data center server consolidation). This paper introduces Solaris Zones (zones), a fully realized solution for server consolidation projects in a commercial UNIX operating system. By creating virtualized application execution environments within a single instance of the operating system, the facility strikes a unique balance between competing requirements. On the one hand, a system with multiple workloads needs to run those workloads in isolation, to ensure that applications can neither observe data from other applications nor affect their operation. It must also prevent applications from over-consuming system resources. On the other hand, the system as a whole has to be flexible, manageable, and observable, in order to reduce administrative costs and increase efficiency. By focusing on the support of multiple application environments rather than multiple operating system instances, zones meets isolation requirements without sacrificing manageability.
Friday: Global comprehension for distributed replay
- IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI ’07
, 2007
"... Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distr ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
(Show Context)
Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distributed conditions and actions. Friday allows the programmer to understand the collective state and dynamics of a distributed collection of coordinated application components. To evaluate Friday, we consider several distributed problems, including routing consistency in overlay networks, and temporal state abnormalities caused by route flaps. We show via micro-benchmarks and larger-scale application measurement that Friday can be used interactively to debug large distributed applications under replay on common hardware.