Results 1 - 10
of
49
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 217 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs
- Proceedings, IEEE Aerospace
, 1997
"... The rapid increase in performance of mass market commodity microprocessors and significant disparity in pricing between PCs and scientific workstations has provided an opportunity for substantial gains in performance to cost by harnessing PC technology in parallel ensembles to provide high end capab ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
The rapid increase in performance of mass market commodity microprocessors and significant disparity in pricing between PCs and scientific workstations has provided an opportunity for substantial gains in performance to cost by harnessing PC technology in parallel ensembles to provide high end capability for scientific and engineering applications. The Beowulf project is a NASA initiative sponsored by the HPCC program to explore the potential of Pileof -PCs and to develop the necessary methodologies to apply these low cost system configurations to NASA computational requirements in the Earth and space sciences. Recently, a 16 processor Beowulf costing less than $50,000 sustained 1.25 Gigaflops on a gravitational Nbody simulation of 10 million particles with a Tree code algorithm using standard commodity hardware and software components. This paper describes the technologies and methodologies employed to achieve this breakthrough. Both opportunities afforded by this approach and the cha...
Tuning the Performance of I/O-Intensive Parallel Applications
, 1996
"... Getting good I/O performance from parallel programs is a critical problem for many application domains. In this paper, we report our experience tuning the I/O performance of four application programs from the areas of satellite-data processing and linear algebra. After tuning, three of the four appl ..."
Abstract
-
Cited by 70 (24 self)
- Add to MetaCart
Getting good I/O performance from parallel programs is a critical problem for many application domains. In this paper, we report our experience tuning the I/O performance of four application programs from the areas of satellite-data processing and linear algebra. After tuning, three of the four applications achieve application-level I/O rates of over 100 MB/s on 16 processors. The total volume of I/O required by the programs ranged from about 75 MB to over 200 GB. We report the lessons learned in achieving high I/O performance from these applications, including the need for code restructuring, local disks on every node and knowledge of future I/O requests. We also report our experience on achieving high performance on peer-to-peer configurations. Finally, we comment on the necessity of complex I/O interfaces like collective I/O and strided requests to achieve high performance. 1 Introduction I/O has been identified as one of the major obstacles to achieving high performance from para...
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces
- IN PROCEEDINGS OF THE 6TH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1996
"... In this paper, we propose a strategy for implementing parallel-I/O interfaces portably and efficiently. We have defined an abstract-device interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on multiple file systems by implementing the API portably on top of ADIO, and im ..."
Abstract
-
Cited by 65 (13 self)
- Add to MetaCart
In this paper, we propose a strategy for implementing parallel-I/O interfaces portably and efficiently. We have defined an abstract-device interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on multiple file systems by implementing the API portably on top of ADIO, and implementing only ADIO on different file systems. This approach simplifies the task of implementing an API and yet exploits the specific high-performance features of individual file systems. We have used ADIO to implement the Intel PFS interface and subsets of MPI-IO and IBM PIOFS interfaces on PFS, PIOFS, Unix, and NFS file systems. Our performance studies indicate that the overhead of using ADIO as an implementation strategy...
Remote I/O: Fast Access to Distant Storage
- In Proceedings of the Fifth Workshop on Input/Output in Parallel and Distributed Systems
, 1997
"... As high-speed networks make it easier to use distributed resources, it becomes increasingly common that applications and their data are not colocated. Users have traditionally addressed this problem by manually staging data to and from remote computers. We argue instead for a new remote I/O paradigm ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
As high-speed networks make it easier to use distributed resources, it becomes increasingly common that applications and their data are not colocated. Users have traditionally addressed this problem by manually staging data to and from remote computers. We argue instead for a new remote I/O paradigm in which programs use familiar parallel I/O interfaces to access remote filesystems. In addition to simplifying remote execution, remote I/O can improve performance relative to staging by overlapping computation and data transfer or by reducing communication requirements. However, remote I/O also introduces new technical challenges in the areas of portability, performance, and integration with distributed computing systems. We propose techniques designed to address these challenges and describe a remote I/O library called RIO that we have developed to evaluate the effectiveness of these techniques. RIO addresses issues of portability by adopting the quasi-standard MPI-IO interface and by de...
Workload Characterization of Input/Output Intensive Parallel Applications
- in Proceedings of the 9th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation
, 1997
"... . The broadening disparity in the performance of input/output (I/O) devices and the performance of processors and communication links on parallel systems is a major obstacle to achieving high performance for a wide range of parallel applications. I/O hardware and file system parallelism are the keys ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
. The broadening disparity in the performance of input/output (I/O) devices and the performance of processors and communication links on parallel systems is a major obstacle to achieving high performance for a wide range of parallel applications. I/O hardware and file system parallelism are the keys to bridging this performance gap. A prerequisite to the development of efficient parallel file systems is detailed characterization of the I/O demands of parallel applications. In this paper, we present a comparative study of the I/O access patterns commonly found in I/O intensive parallel applications. Using the Pablo performance analysis environment and its I/O extensions we captured application I/O access patterns and analyzed their interactions with current parallel I/O systems. This analysis has proven instrumental in guiding the development of new application programming interfaces (APIs) for parallel file systems and in developing effective file system policies that can adaptively re...
Efficient Run-time Support for Irregular Block-Structured Applications
, 1998
"... Parallel implementations of scientific applications often rely on elaborate dynamic data structures with complicated communication patterns. We describe a set of intuitive geometric programming abstractions that simplify coordination of irregular block-structured scientific calculations without sacr ..."
Abstract
-
Cited by 38 (14 self)
- Add to MetaCart
Parallel implementations of scientific applications often rely on elaborate dynamic data structures with complicated communication patterns. We describe a set of intuitive geometric programming abstractions that simplify coordination of irregular block-structured scientific calculations without sacrificing performance. We have implemented these abstractions in KeLP, a C++ run-time library. KeLP's abstractions enable the programmer to express complicated communication patterns for dynamic applications, and to tune communication activity with a high-level, abstract interface. We show that KeLP's flexible communication model effectively manages elaborate data motion patterns arising in structured adaptive mesh refinement, and achieves performance comparable to hand-coded message-passing on several structured numerical kernels. to appear in J. Parallel and Distributed Computing 1 Introduction Many scientific numerical methods employ structured irregular representations to improve accura...
Flexible Communication Mechanisms for Dynamic Structured Applications
, 1996
"... . Irregular scientific applications are often difficult to parallelize due to elaborate dynamic data structures with complicated communication patterns. We describe flexible data orchestration abstractions that enable the programmer to express customized communication patterns arising in an impo ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
. Irregular scientific applications are often difficult to parallelize due to elaborate dynamic data structures with complicated communication patterns. We describe flexible data orchestration abstractions that enable the programmer to express customized communication patterns arising in an important class of irregular computations---adaptive finite difference methods for partial differential equations. These abstractions are supported by KeLP, a C++ run-time library. KeLP enables the programmer to manage spatial data dependence patterns and express data motion handlers as first-class mutable objects. Using two finite difference applications, we show that KeLP's flexible communication model effectively manages elaborate data motion arising in semi-structured adaptive methods. 1 Introduction Many scientific numerical methods employ structured irregular representations to improve accuracy. Irregular representations can resolve fine physical structures in complicated geometri...
Lessons from Characterizing Input/Output Bahavior of Parallel Scientific Applications
- INTERNATIONAL JOURNAL
, 1998
"... Because both processor and interprocessor communication hardware is evolving rapidly with only moderate improvements to file system performance in parallel systems, it is becoming increasingly difficult to provide sufficient input/output (I/O) performance to parallel applications. I/O hardware and f ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Because both processor and interprocessor communication hardware is evolving rapidly with only moderate improvements to file system performance in parallel systems, it is becoming increasingly difficult to provide sufficient input/output (I/O) performance to parallel applications. I/O hardware and file system parallelism are the key to bridging this performance gap. Prerequisite to the development of efficient parallel file system is detailed characterization of the I/O demands of parallel applications. In the paper, we present a comparative study of parallel I/O access patterns, commonly found in I/O intensive scientific applications. The Pablo performance analysis tool and its I/O extensions is a valuable resource in capturing and analyzing the I/O access attributes and their interactions with extant parallel I/O systems. This analysis is instrumental in guiding the development of new application programming interfaces (APIs) for parallel file systems and effective file system polici...
Querying Very Large Multi-dimensional Datasets in ADR
, 1999
"... Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data items is described by range queries. The basic processing involves mapping input data items to output data items, and some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on distributed-memory parallel architectures with multiple disks attached to each node. In this paper we address efficient execution of range queries on distributed memory parallel machines within ADR framework. We present three potential strategies, and evaluate them under different application scenarios and machine co...

