Results 1 - 10
of
51
Dynamic File-Access Characteristics of a Production Parallel Scientific Workload
, 1994
"... Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to treme ..."
Abstract
-
Cited by 76 (12 self)
- Add to MetaCart
Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor le systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation di ers from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
PASSION: Parallel And Scalable Software for Input-Output
, 1994
"... \We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for pa ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
\We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for parallel access to files (read/write), as well as for out-of-core computations. These routines can either be used together with a compiler to translate out-of-core data parallel programs written in a language like HPF, or used directly by application programmers. A number of optimizations such as Two-Phase Access, Data Sieving, Data Prefetching and Data Reuse have been incorporated in the PASSION Runtime Library for improved performance. PASSION also provides an initial framework for runtime support for out-of-core irregular problems. The goal of the PASSION compiler is to automatically translate out- of-core data parallel programs to node programs for distributed memory machines, with calls to the PASSION Runtime Library. At the language level, PASSION suggests extensions to HPF for out-of-core programs. At the file system level, PASSION provides support for buffering and prefetching data from disks. A portable parallel file system is also being developed as part of this project, which can be used across homogeneous or heterogeneous networks of workstations. PASSION also provides support for integrating data and task parallelism using parallel I/O techniques. We have used PASSION to implement a number of out-of-core applications such as a Laplace's equation solver, 2D FFT, Matrix Multiplication, LU Decomposition, image processing applications as well as unstructured mesh kernels in molecular dynamics and computational fluid dynamics. We are currently in the process of using PASSION in applications in CFD (3D turbulent flows), molecular structure calculations, seismic computations, and earth and space science applications such as Four-Dimensional Data Assimilation. PASSION is currently available on the Intel Paragon, Touchstone Delta and iPSC/860. Efforts are underway to port it to the IBM SP-1 and SP-2 using the Vesta Parallel File System.
An extended two-phase method for accessing sections of out-of-core arrays
- SCIENTI C PROGRAMMING, 5(4):301{317, WINTER
, 1996
"... A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/wr ..."
Abstract
-
Cited by 56 (29 self)
- Add to MetaCart
A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this paper, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two phase method, uses collective I/O: Processors cooperate to combine several I/O requests into fewer larger granularity requests, reorder requests so that the file is accessed in proper sequence, and eliminate simultaneous I/O requests for the same data. In addition, the I/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications – matrix multiplication and a Laplace’s equation solver – and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (non-collective) method for accessing out-of-core array sections.
A Compilation System That Integrates High Performance Fortran and Fortran M
- In Proceeding of 1994 Scalable High Performance Computing Conference (Knoxville, TN
, 1994
"... Task parallelism and data parallelism are often seen as mutually exclusive approaches to parallel programming. Yet there are important classes of application, for example in multidisciplinary simulation and command and control, that would benefit from an integration of the two approaches. In this pa ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
Task parallelism and data parallelism are often seen as mutually exclusive approaches to parallel programming. Yet there are important classes of application, for example in multidisciplinary simulation and command and control, that would benefit from an integration of the two approaches. In this paper, we describe a programming system that we are developing to explore this sort of integration. This system builds on previous work on task-parallel and data-parallel Fortran compilers to provide an environment in which the task-parallel language Fortran M can be used to coordinate data-parallel High Performance Fortran tasks. We use an image-processing problem to illustrate the issues that arise when building an integrated compilation system of this sort. 1 Introduction In data-parallel programming, programs apply a sequence of operations identically to all or most elements of a large data structure; in task-parallel programming, programs consist of a set of (potentially dissimilar) para...
Characterizing parallel file-access patterns on a large-scale multiprocessor
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwi ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-v01ume data transfer between tth I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected " workload. So far there have been no-comprehensive usage studies of multiprocessor file systems. Our _. CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at.. _i ",-' NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First, the file system should support efficient concurrent access to many files, and UO requests from many jobs
PASSION Runtime Library for Parallel I/O
- In Proceedings of the Scalable Parallel Libraries Conference
, 1994
"... We are developing a compiler and runtime support system called PASSION: Parallel And Scalable Software for Input-Output. PASSION provides software support for I/O intensive out-of-core loosely synchronous problems. This paper gives an overview of the PASSION Runtime Library and describes two of the ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
We are developing a compiler and runtime support system called PASSION: Parallel And Scalable Software for Input-Output. PASSION provides software support for I/O intensive out-of-core loosely synchronous problems. This paper gives an overview of the PASSION Runtime Library and describes two of the optimizations incorporated in it, namely Data Prefetching and Data Sieving. Performance improvements provided by these optimizations on the Intel Touchstone Delta are discussed, together with an outof -core Median Filtering application. 1 Introduction There are a number of applications which deal with very large quantities of data. These applications exist in diverse areas such as large scale scientific computations, database applications, hypertext and multimedia systems, information retrieval and many other applications of the Information Age. The number of such applications and their data requirements keep increasing day by day. Consequently, it has become apparent that I/O performance ...
Applications-Driven Parallel I/O
- In Proceedings of Supercomputing '93
"... We investigate the needs of some massively parallel applications running on distributed-memory parallel computers at Argonne National Laboratory and identify some common parallel I/O operations. For these operations, routines were developed that hide the details of the actual implementation (such as ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
We investigate the needs of some massively parallel applications running on distributed-memory parallel computers at Argonne National Laboratory and identify some common parallel I/O operations. For these operations, routines were developed that hide the details of the actual implementation (such as the number of parallel disks) from the application, while providing good performance. An important feature is the ability for the application programmer to specify that a file be accessed either as a high-performance parallel file or as a conventional Unix file, simply by changing the value of a parameter on the file open call. These routines are examples of a parallel I/O abstraction that can enhance development, portability, and performance of I/O operations in applications. Some of the specific issues in their design and implementation in a distributed-memory toolset are discussed. 1 Introduction In order to run Grand Challenge computational science applications on massively parallel pr...
Implications of I/O for Gang Scheduled Workloads
, 1997
"... The job workloads of general-purpose multiprocessors usually include both compute-bound parallel jobs, which often require gang scheduling, as well as I/O-bound jobs, which require high CPU priority for the individual gang members of the job in order to achieve interactive response times. Our re ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
The job workloads of general-purpose multiprocessors usually include both compute-bound parallel jobs, which often require gang scheduling, as well as I/O-bound jobs, which require high CPU priority for the individual gang members of the job in order to achieve interactive response times. Our results indicate that an effective interactive multiprocessor scheduler must be flexible and tailor the priority, time quantum, and extent of gang scheduling to the individual needs of each job. Flexible gang scheduling is required because of several weaknesses of traditional gang scheduling. In particular, we show that the response time of I/O-bound jobs suffers under traditional gang scheduling. In addition, we show that not all applications benefit equally from gang scheduling; most real applications can tolerate at least a small amount of scheduling skew without major performance degradation. Finally, we show that messaging statistics contain information about whether applications require gang scheduling. Taken together these results provide evidence that flexible gang scheduling is both necessary and feasible.
An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application
, 1996
"... We present the results of an experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional parallel application code. This application, developed by scientists at the University of Chicago, simulates the gravitational collapse of self-gravita ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
We present the results of an experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional parallel application code. This application, developed by scientists at the University of Chicago, simulates the gravitational collapse of self-gravitating gaseous clouds. It performs parallel I/O by using library routines that we developed and optimized separately for the SP and Paragon. The I/O routines perform two-phase I/O and use the parallel file systems PIOFS on the SP and PFS on the Paragon. We studied the I/O performance for two different sizes of the application. In the small case, we found that I/O was much faster on the SP. In the large case, open, close, and read operations were only slightly faster, and seeks were significantly faster, on the SP; whereas, writes were slightly faster on the Paragon. The communication required within our I/O routines was faster on the Paragon in both cases. The highest read bandwidth ...
Noncontiguous I/O Through PVFS
- In Proceedings of the IEEE International Conference on Cluster Computing
, 2002
"... With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement. Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods to perform these accesses lead to unacceptable performance for large data sets. To enhance performance of noncontiguous I/O, we have created list I/O, a native version of noncontiguous I/O. We have used the Parallel Virtual File System (PVFS) to implement our ideas. Our research and experimentation shows that list I/O outperforms current noncontiguous I/O access methods in most I/O situations and can substantially enhance the performance of real-world scientific applications. 1.

