Results 1 - 10
of
14
Swift/T: Large-scale Application Composition via Distributed-memory Dataflow Processing
"... Many scientific applications are conceptually built up from independent component tasks as a parameter study, optimization, or other search. Large batches of these tasks may be executed on high-end computing systems; however, the coordination of the independent processes, their data, and their data ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
Many scientific applications are conceptually built up from independent component tasks as a parameter study, optimization, or other search. Large batches of these tasks may be executed on high-end computing systems; however, the coordination of the independent processes, their data, and their data dependencies is a significant scalability challenge. Many problems must be addressed, including load balancing, data distribution, notifications, concurrent programming, and linking to existing codes. In this work, we present Swift/T, a programming language and runtime that enables the rapid development of highly concurrent, task-parallel applications. Swift/T is composed of several enabling technologies to address scalability challenges, offers a high-level optimizing compiler for user programming and debugging, and provides tools for binding user code in C/C++/Fortran into a logical script. In this work, we describe the Swift/T solution and present scaling results from the IBM Blue Gene/P and Blue Gene/Q.
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
, 2013
"... Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or pr ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. “Many-task” programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tightly coupled parallelism at the lower level through multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and intertask data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.
Parallelizing the execution of sequential scripts
- In High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for. IEEE
, 2013
"... Scripting is often used in science to create applications via the composition of existing programs. Parallel scripting sys-tems allow the creation of such applications, but each sys-tem introduces the need to adopt a somewhat specialized programming model. We present an alternative scripting approac ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Scripting is often used in science to create applications via the composition of existing programs. Parallel scripting sys-tems allow the creation of such applications, but each sys-tem introduces the need to adopt a somewhat specialized programming model. We present an alternative scripting approach, AMFS Shell, that lets programmers express par-allel scripting applications via minor extensions to existing sequential scripting languages, such as Bash, and then exe-cute them in-memory on large-scale computers. We define a small set of commands between the scripts and a parallel scripting runtime system, so that programmers can compose their scripts in a familiar scripting language. The underly-ing AMFS implements both collective (fast file movement) and functional (transformation based on content) file man-agement. Tasks are handled by AMFS’s built-in execution engine. AMFS Shell is expressive enough for a wide range of applications, and the framework can run such applications efficiently on large-scale computers.
Stochastic tail-phase optimization for bag-of-tasks execution in clouds
- In UCC
, 2012
"... Abstract—Elastic applications like bags of tasks benefit greatly from Infrastructure as a Service (IaaS) clouds that let users allocate compute resources on demand, charging based on reserved time intervals. Users, however, still need guidance for mapping their applications onto multiple IaaS offeri ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Elastic applications like bags of tasks benefit greatly from Infrastructure as a Service (IaaS) clouds that let users allocate compute resources on demand, charging based on reserved time intervals. Users, however, still need guidance for mapping their applications onto multiple IaaS offerings, both minimizing execution time and respecting budget limitations. For budgetcontrolled execution of bags of tasks, we built BaTS, a scheduler that estimates possible budget and makespan combinations using a tiny task sample, and then executes a bag within the user’s budget constraints. Previous work has shown the efficacy of this approach. There remains, however, the risk of outlier tasks causing the execution to exceed the predicted makespan. In this work, we present a stochastic optimization of the tail phase for BaTS ’ execution. The main idea is to use the otherwise idling machines up until the end of their (already paidfor) allocation time. Using the task completion time information acquired during the execution, BaTS decides which tasks to replicate onto idle machines in the tail phase, reducing the makespan and improving the tolerance to outlier tasks. Our evaluation results show that this effect is robust w.r.t. the quality of runtime predictions and is the strongest with more expensive schedules in which many fast machines are available. Index Terms—Scheduling, Budget, Task Replication I.
Exploring Scientific Discovery with Large-Scale Parallel Scripting
"... Scientists and the organizations that fund scientific research frequently face difficult questions about how to allocate scarce resources. Should they pursue safe avenues of investigation that incrementally extend current knowledge? Or should they pursue ideas that are far off the beaten track, whic ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Scientists and the organizations that fund scientific research frequently face difficult questions about how to allocate scarce resources. Should they pursue safe avenues of investigation that incrementally extend current knowledge? Or should they pursue ideas that are far off the beaten track, which are less likely to bear fruit, but more likely to provide revolutionary insights? One group at the University of Chicago [4] is trying to provide some insight by developing a model of scientific discovery and exploring what parameters match real world data, and what parameters maximize knowledge production. This demo will describe the methods we used to take a a sequential simulation of scientific discovery and build an optimization algorithm that scales to thousands of cores. 2 Computing Requirements We use a simulated annealing algorithm to optimize parameters for the model. The objective function being evaluated is a measure of scientific innovation that is determined by running an ensemble of randomized simulations and averaging the results. At each step of the algorithm a parameter is perturbed randomly, and the model rerun to determine the new value of the objective function, as shown in Figure 1a. The new parameter value may be accepted or rejected based on
The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications
"... Abstract—Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can of-ten manage failures more efficiently than traditional check-p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can of-ten manage failures more efficiently than traditional check-point/restart techniques alone. Central to many petascale ap-plications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Ad-ditionally, this paper outlines fault tolerance techniques targeted at leadership class applications.
Many-task computing and blue waters
, 2012
"... This report discusses many-task computing (MTC), both generically and in the context of the proposed Blue Waters systems. Blue Waters is planned to be the largest supercomputer funded by NSF when it begins production use in 2011–2012 at NCSA. The aim of this report is to inform the Blue Waters proje ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This report discusses many-task computing (MTC), both generically and in the context of the proposed Blue Waters systems. Blue Waters is planned to be the largest supercomputer funded by NSF when it begins production use in 2011–2012 at NCSA. The aim of this report is to inform the Blue Waters project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies on Blue Waters. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) ap-plications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output de-pendencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications.
Acknowledgments
"... ter verkrijging van de graad van doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Exacte Wetenschappen op donderdag 28 maart 2013 om 13.45 uur in de Aula van d ..."
Abstract
- Add to MetaCart
(Show Context)
ter verkrijging van de graad van doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Exacte Wetenschappen op donderdag 28 maart 2013 om 13.45 uur in de Aula van de universiteit,