Results 1 - 10
of
13
On Implementing MPI-IO Portably and with High Performance
- In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems
, 1999
"... We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portabl ..."
Abstract
-
Cited by 137 (21 self)
- Add to MetaCart
We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocatean implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocati...
Input/Output Access Pattern Classification Using Hidden Markov Models
- In Proceedings of the Fifth Workshop on Input/Output in Parallel and Distributed Systems
, 1997
"... Input/output performance on current parallel file systems is sensitive to a good match of application access pattern to file system capabilities. Automatic input/output access classification can determine application access patterns at execution time, guiding adaptive file system policies. In this p ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Input/output performance on current parallel file systems is sensitive to a good match of application access pattern to file system capabilities. Automatic input/output access classification can determine application access patterns at execution time, guiding adaptive file system policies. In this paper we examine a new method for access pattern classification that uses hidden Markov models, trained on access patterns from previous executions, to create a probabilistic model of input/output accesses. We compare this approach to a neural network classification framework, presenting performance results from parallel and sequential benchmarks and applications. 1 Introduction Input/output is a critical bottleneck for many important scientific applications. One reason is that performance of extant parallel file systems is particularly sensitive to file access patterns. Often the application programmer must match application input/output requirements to the capabilities of the file system....
Workload Characterization of Input/Output Intensive Parallel Applications
- in Proceedings of the 9th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation
, 1997
"... . The broadening disparity in the performance of input/output (I/O) devices and the performance of processors and communication links on parallel systems is a major obstacle to achieving high performance for a wide range of parallel applications. I/O hardware and file system parallelism are the keys ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
. The broadening disparity in the performance of input/output (I/O) devices and the performance of processors and communication links on parallel systems is a major obstacle to achieving high performance for a wide range of parallel applications. I/O hardware and file system parallelism are the keys to bridging this performance gap. A prerequisite to the development of efficient parallel file systems is detailed characterization of the I/O demands of parallel applications. In this paper, we present a comparative study of the I/O access patterns commonly found in I/O intensive parallel applications. Using the Pablo performance analysis environment and its I/O extensions we captured application I/O access patterns and analyzed their interactions with current parallel I/O systems. This analysis has proven instrumental in guiding the development of new application programming interfaces (APIs) for parallel file systems and in developing effective file system policies that can adaptively re...
Lessons from Characterizing Input/Output Bahavior of Parallel Scientific Applications
- INTERNATIONAL JOURNAL
, 1998
"... Because both processor and interprocessor communication hardware is evolving rapidly with only moderate improvements to file system performance in parallel systems, it is becoming increasingly difficult to provide sufficient input/output (I/O) performance to parallel applications. I/O hardware and f ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Because both processor and interprocessor communication hardware is evolving rapidly with only moderate improvements to file system performance in parallel systems, it is becoming increasingly difficult to provide sufficient input/output (I/O) performance to parallel applications. I/O hardware and file system parallelism are the key to bridging this performance gap. Prerequisite to the development of efficient parallel file system is detailed characterization of the I/O demands of parallel applications. In the paper, we present a comparative study of parallel I/O access patterns, commonly found in I/O intensive scientific applications. The Pablo performance analysis tool and its I/O extensions is a valuable resource in capturing and analyzing the I/O access attributes and their interactions with extant parallel I/O systems. This analysis is instrumental in guiding the development of new application programming interfaces (APIs) for parallel file systems and effective file system polici...
Automatic Classification Of Input/Output Access Patterns
, 1997
"... Despite continued innovations in disk design, input/output performance has not kept pace with concurrent increases in processor speeds. Much research has focused on developing algorithms to avoid input/output or hide input/output latency in an attempt to redress this widening gap. Many studies have ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Despite continued innovations in disk design, input/output performance has not kept pace with concurrent increases in processor speeds. Much research has focused on developing algorithms to avoid input/output or hide input/output latency in an attempt to redress this widening gap. Many studies have shown that with advance knowledge of access patterns, file systems can improve input/output performance by selecting policies appropriate for the resource demands. Unfortunately, access patterns may be complex or data dependent, and therefore unknown a priori. Our thesis is that the file system can automatically detect qualitative file access patterns both locally (per parallel program thread) and globally (per parallel program) and use this information to dynamically choose appropriate file system policies. We propose two complementary methods for automatic classification, based on neural networks and hidden Markov models, respectively. Global classifications are created from a combination...
Reactive Scheduling For Parallel I/O Systems
, 2000
"... Parallel computing is integral to high performance computing, but it is not uniquely sufficient. With the adoption of parallel computing, some additional supporting technologies are required. Parallel I/O is one such supporting technology, providing high speed data storage in parallel computing envi ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Parallel computing is integral to high performance computing, but it is not uniquely sufficient. With the adoption of parallel computing, some additional supporting technologies are required. Parallel I/O is one such supporting technology, providing high speed data storage in parallel computing environments. Parallel I/O systems have emerged and are beginning to see use in the main stream; however, research into optimizing these systems is still an open area. In particular, techniques for optimizing parallel I/O have focused on disk performance optimization when other resources might have equal or greater impact on overall performance. Other work has looked at adaptive techniques for optimizing in these systems, but has focused on caching and prefetching only.
Informed prefetching of collective input/output requests
- Proceedings of SC99
, 1999
"... Optimizing collective input/output (I/O) is important for improving throughput of parallel scientific applications. Current research suggests that a specialized collective application programming interface, coupled with system-level optimizations, is necessary to obtain good I/O performance. Unfortu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Optimizing collective input/output (I/O) is important for improving throughput of parallel scientific applications. Current research suggests that a specialized collective application programming interface, coupled with system-level optimizations, is necessary to obtain good I/O performance. Unfortunately, collective interfaces require an application to disclose its entire access pattern to fully reorder I/O requests, and cannot flexibly utilize additional memory to improve performance. In this paper we propose and analyze a method of optimizing collective access patterns using informed prefetching that is capable of exploiting any amount of available memory to overlap I/O with computation. We compare this approach to diskdirected I/O, an efficient implementation of a collective I/O interface. Moreover, we prove that under certain conditions, a per-processor prefetch depth equal to the number of drives can guarantee sequential disk accesses for any collectively accessed file. In empirical studies, a prefetch horizon of one to two times the number of disks per processor is sufficient to match the performance of disk-directed I/O for sequentially allocated files. Finally, we develop accurate analytical models to predict the throughput of informed prefetching for collective reads as a function of the per-processor prefetch depth. 1
Adaptive Disk Striping for Parallel Input/Output
, 1999
"... As disk capacities continue to rise more rapidly than transfer rates, adaptive, redundant striping smoothly trades capacity for higher performance. We developed a fuzzy logic rule base for adaptive, redundant striping of les across multiple disks. This rule base is based on a queuing model of disk c ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
As disk capacities continue to rise more rapidly than transfer rates, adaptive, redundant striping smoothly trades capacity for higher performance. We developed a fuzzy logic rule base for adaptive, redundant striping of les across multiple disks. This rule base is based on a queuing model of disk contention that includes le request sizes and disk hardware parameters. At low loads, the rule base stripes aggressively to minimize response time. As loads rise, it stripes less aggressively to maximize aggregate throughput. This adaptive striping rule base is incorporated into our second generation Portable Parallel File System (PPFS II). Experimental results showed that the analytical models of disk striping are capable of accurately predicting le system behavior. Also, it is shown that, depending on the access pattern, adaptive striping can double the input/output performance compared to striping with xed distribution parameters. 1 Introduction As new high-performance computing sys...
I/O in Parallel and Distributed Systems
"... One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now commonly recognized as an obstacle. Many scientific applications generate 1 GB of I/O per run [dC94], and applications performing an order of magnitude more are not uncommon: applications in computational physics and fluid dynamics are projected to require I/O on the order of 1 TB [dC94]. It seems clear that these total I/O requirements will keep increasing as scientists continue to study phenomena at larger space and time scales, and at finer space and time resolutions. Since the response time that humans can tolerate for obtaining computational results--- no matter how comprehensive and detailed--- is always bounded, the I/O rates required will continue to increase also. Thus while curre
An application-aware data storage model
- in Proc. USENIX Conf
, 1999
"... reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. For more information about the USENIX Association: ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. For more information about the USENIX Association:

