Results 1 -
5 of
5
On Implementing MPI-IO Portably and with High Performance
- In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems
, 1999
"... We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portabl ..."
Abstract
-
Cited by 137 (21 self)
- Add to MetaCart
We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocatean implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocati...
High-level buffering for hiding periodic output cost in scientific simulations
- IEEE Transactions Parallel Distributed Systems
, 2006
"... Abstract—Scientific applications often need to write out large arrays and associated metadata periodically for visualization or restart purposes. In this paper, we present active buffering, a high-level transparent buffering scheme for collective I/O, in which processors actively organize their idle ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract—Scientific applications often need to write out large arrays and associated metadata periodically for visualization or restart purposes. In this paper, we present active buffering, a high-level transparent buffering scheme for collective I/O, in which processors actively organize their idle memory into a hierarchy of buffers for periodic output data. It utilizes idle memory on the processors, yet makes no assumption regarding runtime memory availability. Active buffering can perform background I/O while the computation is going on, is extensible to remote I/O for more efficient data migration, and can be implemented in a portable style in today’s parallel I/O libraries. It can also mask performance problems of scientific data formats used by many scientists. Performance experiments with both synthetic benchmarks and real simulation codes on multiple platforms show that active buffering can greatly reduce the visible I/O cost from the application’s point of view. Index Terms—Parallel I/O library design, performance optimization, experimentation. 1
An Efficient, Nonintrusive, Log-Based I/O Mechanism for Scientific Simulations on Clusters
"... Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased av ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased availability of inexpensive clusters of workstations, more and more simulations are now run on clusters. Unfortunately, cluster supercomputers are usually not very well equipped for I/O, making I/O a serious bottleneck for such applications. To address this problem, we propose Log-Based I/O (LBIO), an approach that can substantially increase the I/O performance of simulations on clusters by utilizing free space on the cluster’s local disks to stage data on its way to remote storage. LBIO uses local disks to create a log of all I/O calls, and uses a background thread to replay the log at the rate that best utilizes the server and network resources. LBIO is implemented as an easy-to-use, non-intrusive library—a user can turn on LBIO by adding a single initialization call to the simulation code. LBIO also works with existing scientific I/O libraries like HDF, as well as collective libraries like ROMIO. Our performance studies on microbenchmarks and a real-world scientific simulation code show that LBIO can provide upto 35 % improvement in I/O performance for raw I/O and over 50 % for I/O through libraries like ROMIO or HDF. 1
Achieving High Performance with MPI-IO
"... The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allo ..."
Abstract
- Add to MetaCart
The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access noncontiguous data with a single I/O function call, unlike in Unix I/O. In this paper, we explain how critical this feature of MPI-IO is for high performance and how it enables implementations to perform optimizations. An application can be written in many different ways with MPI-IO. We classify the different ways of expressing an application's I/O access pattern in MPI-IO into four levels, called level 0 through level 3. We demonstrate that, for applications with noncontiguous access patterns, the I/O performance improves significantly if users write the application such that it makes level-3 MPI-IO requests (noncontiguous, collective) rather than level-0 requests (Unix s...
Improving MPI-IO Output Performance with Active Buffering
- In Proceedings of the International Parallel and Distributed Processing Symposium
, 2003
"... Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and short ..."
Abstract
- Add to MetaCart
Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and shorten the execution time, but it is not always possible or easy for an application to use them. We propose the use of active buffering with threads (ABT) for overlapping I/O with computation efficiently and flexibly without dedicated I/O servers. We show that the implementation of ABT in ROMIO, a popular implementation of MPI-IO, greatly reduces the application-visible cost of ROMIO's collective write calls, and improves an application's overall performance by hiding I/O cost and saving implicit synchronization overhead from collective write operations. Further, ABT is high-level, platform-independent, and transparent to users, giving users the benefit of overlapping I/O with other processing tasks even when the file system or parallel I/O library does not support asynchronous I/O.

