Results 1 - 10
of
44
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 217 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
File-Access Characteristics of Parallel Scientific Workloads
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving t ..."
Abstract
-
Cited by 92 (10 self)
- Add to MetaCart
Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems. The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.
Dynamic File-Access Characteristics of a Production Parallel Scientific Workload
, 1994
"... Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to treme ..."
Abstract
-
Cited by 76 (12 self)
- Add to MetaCart
Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor le systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation di ers from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
PASSION: Parallel And Scalable Software for Input-Output
, 1994
"... \We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for pa ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
\We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for parallel access to files (read/write), as well as for out-of-core computations. These routines can either be used together with a compiler to translate out-of-core data parallel programs written in a language like HPF, or used directly by application programmers. A number of optimizations such as Two-Phase Access, Data Sieving, Data Prefetching and Data Reuse have been incorporated in the PASSION Runtime Library for improved performance. PASSION also provides an initial framework for runtime support for out-of-core irregular problems. The goal of the PASSION compiler is to automatically translate out- of-core data parallel programs to node programs for distributed memory machines, with calls to the PASSION Runtime Library. At the language level, PASSION suggests extensions to HPF for out-of-core programs. At the file system level, PASSION provides support for buffering and prefetching data from disks. A portable parallel file system is also being developed as part of this project, which can be used across homogeneous or heterogeneous networks of workstations. PASSION also provides support for integrating data and task parallelism using parallel I/O techniques. We have used PASSION to implement a number of out-of-core applications such as a Laplace's equation solver, 2D FFT, Matrix Multiplication, LU Decomposition, image processing applications as well as unstructured mesh kernels in molecular dynamics and computational fluid dynamics. We are currently in the process of using PASSION in applications in CFD (3D turbulent flows), molecular structure calculations, seismic computations, and earth and space science applications such as Four-Dimensional Data Assimilation. PASSION is currently available on the Intel Paragon, Touchstone Delta and iPSC/860. Efforts are underway to port it to the IBM SP-1 and SP-2 using the Vesta Parallel File System.
PIOUS: A Scalable Parallel I/O System for Distributed Computing Environments
- in Proceedings of the Scalable High-Performance Computing Conference
, 1994
"... PIOUS is a parallel file system architecture that provides cost-effective, scalable bandwidth in a network computing environment. PIOUS employs data declustering, to exploit the combined file I/O and buffer cache capacities of networked computing resources, and transaction-based concurrency control, ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
PIOUS is a parallel file system architecture that provides cost-effective, scalable bandwidth in a network computing environment. PIOUS employs data declustering, to exploit the combined file I/O and buffer cache capacities of networked computing resources, and transaction-based concurrency control, to guarantee access consistency without explicit synchronization. This paper presents preliminary results from a prototype PIOUS implementation. 1 Introduction Parallel programming environments that exploit networked computing resources offer a cost-effective alternative to traditional parallel machines. Environments such as PVM [14] and Linda [1], among others [15], enable parallel-distributed application development by providing mechanisms for interprocess communication, synchronization and concurrency control, fault tolerance, and process management. However, many parallel applications require or could benefit from a unified parallel I/O system that such environments generally lack. PIO...
An extended two-phase method for accessing sections of out-of-core arrays
- SCIENTI C PROGRAMMING, 5(4):301{317, WINTER
, 1996
"... A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/wr ..."
Abstract
-
Cited by 56 (29 self)
- Add to MetaCart
A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this paper, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two phase method, uses collective I/O: Processors cooperate to combine several I/O requests into fewer larger granularity requests, reorder requests so that the file is accessed in proper sequence, and eliminate simultaneous I/O requests for the same data. In addition, the I/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications – matrix multiplication and a Laplace’s equation solver – and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (non-collective) method for accessing out-of-core array sections.
Multiprocessor file system interfaces
- In Proceedings of the Second International Conference on Parallel and Distributed Information Systems
, 1993
"... Increasingly, le systems for multiprocessors are designed with parallel access to multiple disks, to keep I/O from becoming a serious bottleneck for parallel applications. Although le system software can transparently provide high-performance access to parallel disks, a new le system interface is ne ..."
Abstract
-
Cited by 50 (5 self)
- Add to MetaCart
Increasingly, le systems for multiprocessors are designed with parallel access to multiple disks, to keep I/O from becoming a serious bottleneck for parallel applications. Although le system software can transparently provide high-performance access to parallel disks, a new le system interface is needed to facilitate parallel access to a le from a parallel application. We describe the di culties faced when using the conventional (Unix-like) interface in parallel applications, and then outline ways to extend the conventional interface to provide convenient access to the le for parallel programs, while retaining the traditional interface for programs that have no need for explicitly parallel le access. Our interface includes a single naming scheme, a multiopen operation, local and global le pointers, mapped le pointers, logical records, multi les, and logical coercion for backward compatibility. 1
Practical prefetching techniques for multiprocessor le systems
- Journal of Distributed and Parallel Databases
, 1993
"... Abstract. Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to dose the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have th ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Abstract. Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to dose the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potentT"al to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of those policies across a range of architectural parameters. Keywords: multiprocessor file systems, parallel I/O, file caching, prefetching 1.
Characterizing parallel file-access patterns on a large-scale multiprocessor
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwi ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-v01ume data transfer between tth I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected " workload. So far there have been no-comprehensive usage studies of multiprocessor file systems. Our _. CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at.. _i ",-' NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First, the file system should support efficient concurrent access to many files, and UO requests from many jobs

