• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems (2001)

by A C Arpaci-Dusseau
Venue:ACM Trans. Computer Systems
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 31
Next 10 →

Resource overbooking and application profiling in shared hosting platforms

by Bhuvan Urgaonkar, Prashant Shenoy, Timothy Roscoe , 2002
"... ..."
Abstract - Cited by 133 (16 self) - Add to MetaCart
Abstract not found

Information and Control in Gray-Box Systems

by Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau - SOSP'01, BANFF, CANADA , 2001
"... In modern systems, developers are often unable to modify the underlying operating system. To build services in such an environment, we advocate the use of gray-box techniques. When treating ..."
Abstract - Cited by 98 (21 self) - Add to MetaCart
In modern systems, developers are often unable to modify the underlying operating system. To build services in such an environment, we advocate the use of gray-box techniques. When treating

Paired Gang Scheduling

by Yair Wiseman, Dror G. Feitelson - IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS , 2003
"... Conventional gang scheduling has the disadvantage that when processes perform I/O or blocking communication, their processors remain idle, because alternative processes cannot be run independently of their own gangs. To alleviate this problem we suggest a slight relaxation of this rule: match gangs ..."
Abstract - Cited by 31 (10 self) - Add to MetaCart
Conventional gang scheduling has the disadvantage that when processes perform I/O or blocking communication, their processors remain idle, because alternative processes cannot be run independently of their own gangs. To alleviate this problem we suggest a slight relaxation of this rule: match gangs that make heavy use of the CPU with gangs that make light use of the CPU (presumably due to I/O or communication activity), and schedule such pairs together, allowing the local scheduler on each node to select either of the two processes at any instant. As I/O-intensive gangs make light use of the CPU, this only causes a minor degradation in the service to compute-bound jobs. This degradation is more than offset by the overall improvement in system performance due to the better utilization of the resources.

Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms

by Sriram Govindan, Arjun R. Nath, Amitayu Das, Bhuvan Urgaonkar, Anand Sivasubramaniam , 2007
"... Recent advances in software and architectural support for server virtualization have created interest in using this technology in the design of consolidated hosting platforms. Since virtualization enables easier and faster application migration as well as secure co-location of antagonistic applicati ..."
Abstract - Cited by 25 (1 self) - Add to MetaCart
Recent advances in software and architectural support for server virtualization have created interest in using this technology in the design of consolidated hosting platforms. Since virtualization enables easier and faster application migration as well as secure co-location of antagonistic applications, higher degrees of server consolidation are likely to result in such virtualization-based hosting platforms (VHPs). We identify a key shortcoming in existing virtual machine monitors (VMMs) that proves to be an obstacle in operating hosting platforms, such as Internet data centers, under conditions of such high consolidation: CPU schedulers that are agnostic to the communication behavior of modern, multi-tier applications. We develop a new communication-aware CPU scheduling algorithm to alleviate this problem. We implement our algorithm in the Xen VMM and build a prototype VHP on a cluster of servers. Our experimental evaluation with realistic Internet server applications and benchmarks demonstrates the performance/cost benefits and the wide applicability of our algorithms. For example, the TPC-W benchmark exhibited improvements in average response times of up to 35 % for a variety of consolidation scenarios. A streaming media server hosted on our prototype VHP was able to satisfactorily service up to 3.5 times as many clients as one running on the default Xen.

Plfs: A checkpoint filesystem for parallel applications

by John Bent, Ben Mcclelland, Garth Gibson, Paul Nowoczynski, Gary Grider, James Nunez, Milo Polte, Meghan Wingate , 2009
"... Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the si ..."
Abstract - Cited by 23 (6 self) - Add to MetaCart
Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in pathologically poor performance from the underlying file system which is optimized for large, aligned writes to non-shared files. To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an application’s preferred data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and real applications without any application modification.

STORM: Lightning-Fast Resource Management

by Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Scott Pakin, Salvador Coll - In Supercomputing 2002 , 2002
"... Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management subsystem—essentially, all of the c ..."
Abstract - Cited by 20 (9 self) - Add to MetaCart
Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management subsystem—essentially, all of the code that runs on a cluster other than the applications— increasingly impacts application efficiency. In this paper, we present STORM, a resourcemanagement framework designed for scalability and performance. The key innovation behind STORM is a software architecture that enables resource management to exploit low-level network features. As a result of this HPC-application-like design, STORM is orders of magnitude faster than the best reported results in the literature on two sample resource-management functions: job launching and process scheduling. 1

Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources

by Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernandez - Proc. Int. Parallel and Distributed Processing Symposium (IPDPS'03 , 2002
"... Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time ..."
Abstract - Cited by 18 (6 self) - Add to MetaCart
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with 100% efficiency. Naturally, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job's granularity and communication activity, and using gang scheduling only for those jobs for which it is appropriate. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully implemented as part of the STORM resource manager, and shown to be competitive with gang scheduling and implicit coscheduling.

Performance Availability for Networks of Workstations

by Remzi H. Arpaci-dusseau, Remzi H. Arpaci-dusseau, Remzi H. Arpaci-dusseau , 1999
"... Performance Availability for Networks of Workstations by Remzi H. Arpaci-Dusseau Software systems for large-scale distributed and parallel machines are difficult to build. When run in dynamic, production environments, not only must such systems perform correctly, but they must also operate with ..."
Abstract - Cited by 17 (5 self) - Add to MetaCart
Performance Availability for Networks of Workstations by Remzi H. Arpaci-Dusseau Software systems for large-scale distributed and parallel machines are difficult to build. When run in dynamic, production environments, not only must such systems perform correctly, but they must also operate with high performance. Much of the previous work in distributed computing has addressed the design of large-scale systems that function correctly, in spite of correctness faults of individual components [18, 49, 82, 86]. However, there has been little development of techniques to tolerate performance faults -- unexpected performance fluctuations from the components that comprise the system. Due to this shortcoming, many systems are overly sensitive to performance variations, in that global performance is high if and only if all system components perform exactly as expected. In this dissertation, we address this deficiency by formalizing the concept of performance availability. Our hypothesis is ...

Adaptive scheduling under memory pressure on multiprogrammed clusters

by Dimitrios S. Nikolopoulos, Constantine D. Polychronopoulos - In Proc. of the 2nd IEEE/ACM International Conference on Cluster Computing and the Grid (ccGrid’02 , 2002
"... We present a simple scheduling strategy that copes with the adverse effects of paging on multiprogrammed SMPs. We consider open, multiuser SMP servers, typically found in academic or industrial environments. Our strategy incorporates four uniquely combined features. It is adaptive, in the sense that ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
We present a simple scheduling strategy that copes with the adverse effects of paging on multiprogrammed SMPs. We consider open, multiuser SMP servers, typically found in academic or industrial environments. Our strategy incorporates four uniquely combined features. It is adaptive, in the sense that the programs themselves take scheduling actions upon detecting memory pressure; it is dynamic, since programs detect the likelihood of paging at runtime by communicating with the operating system through a lightweight interface; it is preventive, because it takes scheduling actions before paging occurs; and it is non-intrusive, because the local scheduling actions taken by a program do not affect adversely, but act to the benefit of other programs sharing the system. We present an efficient implementation of our strategy in Linux and show with a realistic production workload that it can improve the response time of the Linux kernel under memory pressure by up to a factor of eight and the throughput by up to a factor of four. 1.

Pitfalls in parallel job scheduling evaluation

by Eitan Frachtenberg, Dror G. Feitelson - 11th Workshop on Job Scheduling Strategies for Parallel Processing , 2005
"... There are many choices to make when evaluating the performance of a complex system. In the context of parallel job scheduling, one must decide what workload to use and what measurements to take. These decisions sometimes have subtle implications that are easy to overlook. In this paper we document n ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
There are many choices to make when evaluating the performance of a complex system. In the context of parallel job scheduling, one must decide what workload to use and what measurements to take. These decisions sometimes have subtle implications that are easy to overlook. In this paper we document numerous pitfalls one may fall into, with the hope of providing at least some help in avoiding them. Along the way, we also identify topics that could benefit from additional research.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University