Results 1 - 10
of
23
Design and Implementation Tradeoffs for Wide-Area Resource Discovery
- In Proceedings of 14th IEEE Symposium on High Performance, Research Triangle Park
, 2005
"... We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node c ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node characteristics, along with the utility that the application derives from specified ranges of metric values. This design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics, and allows the system to rank acceptable configurations based on their quality for that application. Rather than evaluating a single implementation of SWORD, we explore a variety of architectural designs that deliver the required functionality in a scalable and highly-available manner. We discuss the tradeoffs of using a centralized architecture as compared to a fully decentralized design to perform wide-area resource discovery. To summarize our results, we found that a centralized architecture based on 4-node server cluster sites at network peering facilities outperforms a decentralized DHT-based resource discovery infrastructure with respect to query latency for all but the smallest number of sites. However, although a centralized architecture shows significant promise in stable environments, we find that our decentralized implementation has acceptable performance and also benefits from the DHT’s self-healing properties in more volatile environments. We evaluate the advantages and disadvantages of centralized and distributed resource discovery architectures on 1000 hosts in emulation and on approximately 200 PlanetLab nodes spread across the Internet.
Usher: An Extensible Framework for Managing Clusters of Virtual
- Machines,’’ Proceedings of the USENIX Large Installation System Administration Conference (LISA
, 2007
"... Usher is a virtual machine management system designed to impose few constraints upon the computing environment under its management. Usher enables administrators to choose how their virtual machine environment will be configured and the policies under which they will be managed. The modular design o ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Usher is a virtual machine management system designed to impose few constraints upon the computing environment under its management. Usher enables administrators to choose how their virtual machine environment will be configured and the policies under which they will be managed. The modular design of Usher allows for alternate implementations for authentication, authorization, infrastructure handling, logging, and virtual machine scheduling. The design philosophy of Usher is to provide an interface whereby users and administrators can request virtual machine operations while delegating administrative tasks for these operations to modular plugins. Usher’s implementation allows for arbitrary action to be taken for nearly any event in the system. Since July 2006, Usher has been used to manage virtual clusters at two locations under very different settings, demonstrating the flexibility of Usher to meet different virtual machine management requirements.
An Experimentation Workbench for Replayable Networking Research
- In Proceedings of the Symposium on Networked System Design and Implementation
, 2007
"... The networked and distributed systems research communities have an increasing need for “replayable ” research, but our current experimentation resources fall short of satisfying this need. Replayable activities are those that can be re-executed, either as-is or in modified form, yielding new results ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The networked and distributed systems research communities have an increasing need for “replayable ” research, but our current experimentation resources fall short of satisfying this need. Replayable activities are those that can be re-executed, either as-is or in modified form, yielding new results that can be compared to previous ones. Replayability requires complete records of experiment processes and data, of course, but it also requires facilities that allow those processes to actually be examined, repeated, modified, and reused. We are now evolving Emulab, our popular network testbed management system, to be the basis of a new experimentation workbench in support of realistic, largescale, replayable research. We have implemented a new model of testbed-based experiments that allows people to move forward and backward through their experimentation processes. Integrated tools help researchers manage their activities (both planned and unplanned), software artifacts, data, and analyses. We present the workbench, describe its implementation, and report how it has been used by early adopters. Our initial case studies highlight both the utility of the current workbench and additional usability challenges that must be addressed. 1
Harnessing Virtual Machine Resource Control for Job Management
- Proceedings of the First International Workshop on Virtualization Technology in Distributed Computing (VTDC
, 2007
"... Virtual machine technology promises important benefits for grid computing and cluster batch job systems, including improved isolation, customizable workspaces, and support for checkpointing and migration. One way to gain these benefits is to “drill holes ” in existing batch computing systems; howeve ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Virtual machine technology promises important benefits for grid computing and cluster batch job systems, including improved isolation, customizable workspaces, and support for checkpointing and migration. One way to gain these benefits is to “drill holes ” in existing batch computing systems; however, we believe these new capabilities warrant a rethinking of the architectures of existing systems. We propose separating resource control for VMs into a new foundational layer that focuses narrowly on resource management. We present JAWS, a new batch computing service that is built as a thin-layer above a resource control plane that enables it to share a common pool of networked cluster resources with other cluster services. JAWS executes jobs within isolated virtual machine workspaces. We discuss how exposing resource control allows JAWS to leverage VM-based resource isolation as a means to learn models of application behavior, and use those models to guide scheduling policies for efficient resource sharing. 1
The flexlab approach to realistic evaluation of networked systems
- in Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI’07
, 2007
"... Networked systems are often evaluated on overlay testbeds such as PlanetLab and emulation testbeds such as Emulab. Emulation testbeds give users great control over the host and network environments and offer easy reproducibility, but only artificial network conditions. Overlay testbeds provide real ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Networked systems are often evaluated on overlay testbeds such as PlanetLab and emulation testbeds such as Emulab. Emulation testbeds give users great control over the host and network environments and offer easy reproducibility, but only artificial network conditions. Overlay testbeds provide real network conditions, but are not repeatable environments and provide less control over the experiment. We describe the motivation, design, and implementation of Flexlab, a new testbed with the strengths of both overlay and emulation testbeds. It enhances an emulation testbed by providing the ability to integrate a wide variety of network models, including those obtained from an overlay network. We present three models that demonstrate its usefulness, including “application-centric Internet modeling” that we specifically developed for Flexlab. Its key idea is to run the application within the emulation testbed and use its offered load to measure the overlay network. These measurements are used to shape the emulated network. Results indicate that for evaluation of applications running over Internet paths, Flexlab with this model can yield far more realistic results than either PlanetLab without resource reservations, or Emulab without topological information. 1
DOME: A Diverse Outdoor Mobile Testbed
"... A series of complex dependencies conspire to make it difficult to model mobile networks, including mobility, channel and radio characteristics, and power consumption. To address these challenges, we have designed and built a testbed for large-scale mobile experimentation, called the Diverse Outdoor ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
A series of complex dependencies conspire to make it difficult to model mobile networks, including mobility, channel and radio characteristics, and power consumption. To address these challenges, we have designed and built a testbed for large-scale mobile experimentation, called the Diverse Outdoor Mobile Environment. DOME consists of computer-equipped buses, battery-powered nomadic nodes, organic WiFi APs, and a municipal WiFi mesh network. While the construction of a testbed such as DOME presents a significant engineering challenge, this paper describes a concrete set of scientific results derived from this experience. We argue that a broad range of mobility experiments could be performed in a testbed which provides the properties of temporal, technological, and spatial diversity. We demonstrate these properties in our testbed through analysis of data collected from DOME over a period of four years. Finally, we use DOME to provide insight into several open problems in mobile systems research. 1.
High-Bandwidth Data Dissemination for Large-Scale Distributed Systems
, 2008
"... This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted a ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree. To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the
Lessons from resource allocators for large-scale multiuser testbeds
- SIGOPS Oper. Syst. Rev
, 2006
"... Resource allocation is a key aspect of shared testbed infrastructures such as PlanetLab and Emulab. Despite their many differences, both types of testbed have many resource allocation issues in common. In this paper we explore issues related to designing a general resource allocation interface that ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Resource allocation is a key aspect of shared testbed infrastructures such as PlanetLab and Emulab. Despite their many differences, both types of testbed have many resource allocation issues in common. In this paper we explore issues related to designing a general resource allocation interface that is sufficient for a wide variety of testbeds, current and future. Our explorations are informed by our experience developing and running Emulab’s “assign ” resource allocator and the “SWORD ” resource discoverer, our experience with the PlanetLab and Emulab testbeds, and our projection of future testbed needs. 1
Remote Control: Distributed Application Configuration, Management, and Visualization with Plush
"... Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, maintenance, and visualization in distributed environments, few tools provide a unified framework ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, maintenance, and visualization in distributed environments, few tools provide a unified framework for application management. Many of the existing tools address the management needs of a single type of application or service that runs in a specific environment, and these tools are not adaptable enough to be used for other applications or platforms. In this paper, we present the design and implementation of Plush, a fully configurable application management infrastructure designed to meet the general requirements of several different classes of distributed applications and execution environments. Plush allows developers to specifically define the flow of control needed by their computations using application building blocks. Through an extensible resource management interface, Plush supports execution in a variety of environments, including both live deployment platforms and emulated clusters. To gain an understanding of how Plush manages different classes of distributed applications, we take a closer look at specific applications and evaluate how Plush provides support for each.
Integrated scientific workflow management for the emulab network testbed
- In Proc. USENIX
, 2006
"... The main forces that shaped current network testbeds were the needs for realism and scale. Now that several testbeds support large and complex experiments, management of experimentation processes and results has become more difficult and a barrier to high-quality systems research. The popularity of ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The main forces that shaped current network testbeds were the needs for realism and scale. Now that several testbeds support large and complex experiments, management of experimentation processes and results has become more difficult and a barrier to high-quality systems research. The popularity of network testbeds means that new tools for managing experiment workflows, addressing the ready-made base of testbed users, can have important and significant impacts. We are now evolving Emulab, our large and popular network testbed, to support experiments that are organized around scientific workflows. This paper summarizes the opportunities in this area, the new approaches we are taking, our implementation in progress, and the challenges in adapting scientific workflow concepts for testbed-based research. With our system, we expect to demonstrate that a network testbed with integrated scientific workflow management can be an important tool to aid research in networking and distributed systems. 1

