Results 1 - 10
of
56
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 217 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
Input/Output Characteristics of Scalable Parallel Applications
- In Proceedings of the Supercomputing ’95
, 1995
"... Rapid increases in computing and comm unication performance are exacerbating the long-standing problem of performance-limited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output syste ..."
Abstract
-
Cited by 100 (2 self)
- Add to MetaCart
Rapid increases in computing and comm unication performance are exacerbating the long-standing problem of performance-limited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output systems depends critically on the input/output requirements and access patterns for this emerging class of large-scale parallel applications. Ho wever, hard data on the behavior of such applications is only now becoming available. In this paper, we describe the input/output requirements of three scalable parallel applications (electron scattering, terrain rendering, and quantum chemistry) on the Intel Paragon XP/S. As part of an ongoing parallel input/output characterization e ort, we used instrumented versions of the application codes to capture
PASSION: Parallel And Scalable Software for Input-Output
, 1994
"... \We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for pa ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
\We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for parallel access to files (read/write), as well as for out-of-core computations. These routines can either be used together with a compiler to translate out-of-core data parallel programs written in a language like HPF, or used directly by application programmers. A number of optimizations such as Two-Phase Access, Data Sieving, Data Prefetching and Data Reuse have been incorporated in the PASSION Runtime Library for improved performance. PASSION also provides an initial framework for runtime support for out-of-core irregular problems. The goal of the PASSION compiler is to automatically translate out- of-core data parallel programs to node programs for distributed memory machines, with calls to the PASSION Runtime Library. At the language level, PASSION suggests extensions to HPF for out-of-core programs. At the file system level, PASSION provides support for buffering and prefetching data from disks. A portable parallel file system is also being developed as part of this project, which can be used across homogeneous or heterogeneous networks of workstations. PASSION also provides support for integrating data and task parallelism using parallel I/O techniques. We have used PASSION to implement a number of out-of-core applications such as a Laplace's equation solver, 2D FFT, Matrix Multiplication, LU Decomposition, image processing applications as well as unstructured mesh kernels in molecular dynamics and computational fluid dynamics. We are currently in the process of using PASSION in applications in CFD (3D turbulent flows), molecular structure calculations, seismic computations, and earth and space science applications such as Four-Dimensional Data Assimilation. PASSION is currently available on the Intel Paragon, Touchstone Delta and iPSC/860. Efforts are underway to port it to the IBM SP-1 and SP-2 using the Vesta Parallel File System.
An extended two-phase method for accessing sections of out-of-core arrays
- SCIENTI C PROGRAMMING, 5(4):301{317, WINTER
, 1996
"... A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/wr ..."
Abstract
-
Cited by 56 (29 self)
- Add to MetaCart
A number of applications on parallel computers deal with very large data sets that cannot fit in the main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this paper, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two phase method, uses collective I/O: Processors cooperate to combine several I/O requests into fewer larger granularity requests, reorder requests so that the file is accessed in proper sequence, and eliminate simultaneous I/O requests for the same data. In addition, the I/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications – matrix multiplication and a Laplace’s equation solver – and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (non-collective) method for accessing out-of-core array sections.
Integrating Theory and Practice in Parallel File Systems
- PROCEEDINGS OF THE 1993 DAGS/PC SYMPOSIUM (THE DARTMOUTH INSTITUTE FOR ADVANCED GRADUATE STUDIES
, 1993
"... Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that mus ..."
Abstract
-
Cited by 48 (11 self)
- Add to MetaCart
Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that must be provided by the system to support these optimal algorithms: control over declustering, querying about the configuration, independent I/O, and turning off parity, file caching, and prefetching. We summarize recent theoretical and empirical work that justifies the need for these capabilities. In addition, we sketch an organization for a parallel file interface with low-level primitives and higher-level operations.
PASSION Runtime Library for Parallel I/O
- In Proceedings of the Scalable Parallel Libraries Conference
, 1994
"... We are developing a compiler and runtime support system called PASSION: Parallel And Scalable Software for Input-Output. PASSION provides software support for I/O intensive out-of-core loosely synchronous problems. This paper gives an overview of the PASSION Runtime Library and describes two of the ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
We are developing a compiler and runtime support system called PASSION: Parallel And Scalable Software for Input-Output. PASSION provides software support for I/O intensive out-of-core loosely synchronous problems. This paper gives an overview of the PASSION Runtime Library and describes two of the optimizations incorporated in it, namely Data Prefetching and Data Sieving. Performance improvements provided by these optimizations on the Intel Touchstone Delta are discussed, together with an outof -core Median Filtering application. 1 Introduction There are a number of applications which deal with very large quantities of data. These applications exist in diverse areas such as large scale scientific computations, database applications, hypertext and multimedia systems, information retrieval and many other applications of the Information Age. The number of such applications and their data requirements keep increasing day by day. Consequently, it has become apparent that I/O performance ...
Compiler Support for Out-of-Core Arrays on Parallel Machines
- In Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation
, 1995
"... Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques at ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fortran D's DECOMPOSITION, ALIGN, and DISTRIBUTE statements. The central problem is to generate "deferred routines" which delay computations until all the data they require have been read into main memory. We present the results for two applications, LU factorization and red-black relaxation, on 1 to 32 nodes of an Intel Paragon after hand application of these compiler techniques. 1 Introduction Improvements in processor performance have outpaced developments in both memory and disk I/O speed. As a result, out-of-core applications, which require significantly more data than will fit into RAM,...
Expanding the potential for disk-directed I/O
- In Proceedings of the 1995 IEEE Symposium on Parallel and Distributed Processing
, 1995
"... As parallel computers are increasingly used to run scienti c applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, e ective parallel le systems for data storage and for temporary les. In an earlier work we demonstrated that a tech ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
As parallel computers are increasingly used to run scienti c applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, e ective parallel le systems for data storage and for temporary les. In an earlier work we demonstrated that a technique we call disk-directed I/O has the potential to provide consistent high performance for large, collective, structured I/O requests. In this paper we expand on this potential by demonstrating the ability of a disk-directed I/O system to read irregular subsets of data from a le, and to lter and distribute incoming data according to data-dependent functions. 1
HFS: A flexible file system for shared-memory multiprocessors
, 1994
"... The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture is based on the principle that a file system must support a wide variety of file structures, file system policies and I/O interfaces to maximize performance for a wide variety of applications ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture is based on the principle that a file system must support a wide variety of file structures, file system policies and I/O interfaces to maximize performance for a wide variety of applications. HFS uses a novel, object-oriented building-block approach to provide the flexibility needed to support this variety of file structures, policies, and I/O interfaces. File structures can be defined in HFS that optimize for sequential or random access, read-only, write-only or read/write access, sparse or dense data, large or small file sizes, and different degrees of application concurrency. Policies that can be defined on a per-file or per-open instance basis include locking policies, prefetching policies, compression/decompression policies and file cache management policies. In contrast, most existing file systems have been designed to support a single file structure and a small set of po...
Disk-directed I/O for an Out-of-core Computation
- In Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing
, 1995
"... New le systems are critical to obtain good I/O performance on large multiprocessors. Several researchers have suggested the use of collective le-system operations, in which all processes in an application cooperate in each I/O request. Others have suggested that the traditional lowlevel interface (r ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
New le systems are critical to obtain good I/O performance on large multiprocessors. Several researchers have suggested the use of collective le-system operations, in which all processes in an application cooperate in each I/O request. Others have suggested that the traditional lowlevel interface (read, write, seek) be augmented with various higher-level requests (e.g., read matrix), allowing the programmer to express a complex transfer in a single (perhaps collective) request. Collective, high-level requests permit techniques like two-phase I/O and disk-directed I/O to signi cantly improve performance over traditional le systems and interfaces. Neither of these techniques have been tested on anything other than simple benchmarks that read or write matrices. Many applications, however, intersperse computation and I/O to work with data sets that cannot t in main memory. In this paper, we present the results of experiments with an \out-of-core " LU-decomposition program, comparing a traditional interface and le system with a system that has a high-level, collective interface and disk-directed I/O. We found that a collective interface was awkward in some places, and forced additional synchronization. Nonetheless, disk-directed I/O was able to obtain much better performance than the traditional system.

