• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application (1996)

by Rajeev Thakur , William Gropp, Ewing Lusk
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 21
Next 10 →

On Implementing MPI-IO Portably and with High Performance

by Rajeev Thakur, William Gropp, Ewing Lusk - In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems , 1999
"... We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portabl ..."
Abstract - Cited by 137 (21 self) - Add to MetaCart
We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocatean implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocati...

An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces

by Rajeev Thakur, William Gropp, Ewing Lusk - IN PROCEEDINGS OF THE 6TH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION , 1996
"... In this paper, we propose a strategy for implementing parallel-I/O interfaces portably and efficiently. We have defined an abstract-device interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on multiple file systems by implementing the API portably on top of ADIO, and im ..."
Abstract - Cited by 65 (13 self) - Add to MetaCart
In this paper, we propose a strategy for implementing parallel-I/O interfaces portably and efficiently. We have defined an abstract-device interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on multiple file systems by implementing the API portably on top of ADIO, and implementing only ADIO on different file systems. This approach simplifies the task of implementing an API and yet exploits the specific high-performance features of individual file systems. We have used ADIO to implement the Intel PFS interface and subsets of MPI-IO and IBM PIOFS interfaces on PFS, PIOFS, Unix, and NFS file systems. Our performance studies indicate that the overhead of using ADIO as an implementation strategy...

Integrating Parallel File I/O and Database Support for High-Performance Scientific Data Management

by Jaechun No, Rajeev Thakur, Alok Choudhary - In Proc. of SC2000: High Performance Networking and Computing , 2000
"... Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: ..."
Abstract - Cited by 16 (3 self) - Add to MetaCart
Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that aims to combine the good features of both file I/O and databases. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. SDM takes advantage of various I/O optimizations available in MPI-IO, such as collective I/O and noncontiguous requests, in a manner that is transparent to the user. As a result, users can write and retrieve data with the performance of parallel file I/O, without having to bother with the details of actually performing file I/O. In this paper, we describe the design and implementation of SDM. With the help of two parallel application templates, ASTRO3D and an Euler solver, we illustrate how some of the design criteria affect performance. 0-7803-9802-5/2000/$10.00 c 2000 IEEE

I/O in Parallel Applications: The Weakest Link

by Rajeev Thakur, Ewing Lusk, William Gropp - INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS , 1998
"... Parallel computers are increasingly being used to run large-scale applications that also have huge I/O requirements. However, many applications obtain poor I/O performance on modern parallel machines. This special issue of IJSA contains papers that describe the I/O requirements and the techniques ..."
Abstract - Cited by 14 (3 self) - Add to MetaCart
Parallel computers are increasingly being used to run large-scale applications that also have huge I/O requirements. However, many applications obtain poor I/O performance on modern parallel machines. This special issue of IJSA contains papers that describe the I/O requirements and the techniques used to perform I/O in real parallel applications. We first explain how the I/O application program interface (API) plays a critical role in enabling such applications to achieve high I/O performance. We describe how the commonly used Unix I/O interface is inappropriate for parallel I/O and how an explicitly parallel API with support for collective I/O can help the underlying I/O hardware and software perform I/O efficiently. We then describe MPI-IO, a recently defined, standard, portable API specifically designed for high-performance parallel I/O. We conclude with an overview of the papers in this special issue.

ChemIO: High-Performance Parallel I/O for Computational Chemistry Applications

by Jarek Nieplocha, Ian Foster, Rick A. Kendall - for Computational Chemistry Applications, Intl. J. Supercomp. Apps. High Perf. Comp.12 , 1998
"... Recent developments in I/O systems on scalable parallel computers have sparked renewed interest in out-of-core methods for computational chemistry. These methods can improve execution time significantly relative to "direct" methods, which perform many redundant computations. However, the widespread ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Recent developments in I/O systems on scalable parallel computers have sparked renewed interest in out-of-core methods for computational chemistry. These methods can improve execution time significantly relative to "direct" methods, which perform many redundant computations. However, the widespread use of such out-of-core methods requires efficient and portable implementations of often complex I/O patterns. The ChemIO project has addressed this problem by defining an I/O interface that captures the I/O patterns found in important computational chemistry applications and by providing high-performance implementations of this interface on multiple platforms. This development not only broadens the user community for parallel I/O techniques but also provides new insights into the functionality required in general-purpose scalable I/O libraries and the techniques required to achieve high- performance I/O on scalable parallel computers. 1 Introduction Computational chemistry refers t...

Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems

by A. Choudhary, J. No, G. Memik, X. Shen, W. Liao, H. Nagesh, S. More, V. Taylor, R. Thakur, R. Stevens - In Proc. of the Eighth IEEE Int’l Symposium on High Performance Distributed Computing , 1999
"... With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage arch ..."
Abstract - Cited by 11 (4 self) - Add to MetaCart
With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file storage systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a novel application development environment which is built around an active meta-data management system (MDMS) to handle high-level data in an effective manner. The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified, performance-oriented directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques for the application at hand to the MDMS. We discuss the importance of an active MDMS and show how the three components of our environment, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our ongoing implementation and illustrate that significant improvements are made possible without undue programming effort. 1

Applications of parallel I/O

by David Kotz , 1996
"... Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about sc ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about scientific applications using parallel I/O. It will be updated periodically.

A Novel Application Development Environment for Large-Scale Scientific Computations

by X. Shen, W. Liao, A. Choudhary, G. Memik, M. Kandemir, S. More, G. Thiruvathukal, A. Singh , 2000
"... Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that t ..."
Abstract - Cited by 11 (8 self) - Add to MetaCart
Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that these solutions either require a deep understanding of specific data storage architectures and file layouts to obtain the best performance (as in high-performance storage management systems and parallel file systems) or they sacrifice significant performance in exchange for ease-of-use and portability (as in traditional database management systems). While the success of these approaches varies depending on the specific system and applications, the trend in scientific computing towards processing large-scale datasets demands both high-performance and ease-of-use.

Performance implications of architectural and software techniques on i/o-intensive applications

by Meenakshi A. Kandaswamy, Mahmut Kandemir, Alok Choudhary, David E. Bernholdt - In Proc. the International Conference on Parallel Processing (ICPP'98 , 1998
"... Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the contemporary messagepassing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the contemporary messagepassing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and architectural scalability and evaluate the effect of them in five I/O intensive applications from both small and large application domains. Our goals in this study are twofold: First, we want to understand the behavior of large-scale data intensive applications and the impact of I/O subsystem on their performance and vice-versa. Second, and more importantly, we strive to determine the solutions for improving the applications ’ performance by a mix of architectural and software solutions. Our results reveal that the different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas some others benefit from collective I/O. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of increase in I/O resources. 1

PI/OT, Parallel I/O Templates

by Ian Parsons, Ron Unrau, Jonathan Schaeffer, Duane Szafron , 1997
"... This paper presents a novel, top-down, high-level approach to parallelizing file I/O. Each parallel file descriptor is annotated with a high-level specification, or template, of the expected parallel behaviour. The annotations are external to and independent of the source code. At run-time, all I/O ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
This paper presents a novel, top-down, high-level approach to parallelizing file I/O. Each parallel file descriptor is annotated with a high-level specification, or template, of the expected parallel behaviour. The annotations are external to and independent of the source code. At run-time, all I/O using a parallel file descriptor adheres to the semantics of the selected template. By separating the parallel I/O specifications from the code, a user can quickly change the I/O behaviour without rewriting code. Templates can be composed hierarchically to construct more complex access patterns. Two sample parallel programs using these templates are compared against versions implemented in an existing parallel I/O system (PIOUS). The sample programs show that the use of parallel I/O templates are beneficial from both the performance and software engineering points of view. 1. Introduction The development of parallel applications has focused on computational parallelism. However, the corres...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University