Results 1 - 10
of
10
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
- INTERN. J. HIGH PERF. COMP. APPLICATIONS
, 2005
"... This paper describes capabilities, evolution, performance, and applications of the Global Arrays (GA) toolkit. GA was created to provide application programmers with an interface that allows them to distribute data while maintaining the type of global index space and programming syntax similar to th ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
This paper describes capabilities, evolution, performance, and applications of the Global Arrays (GA) toolkit. GA was created to provide application programmers with an interface that allows them to distribute data while maintaining the type of global index space and programming syntax similar to that available when programming on a single processor. The goal of GA is to free the programmer from the low level management of communication and allow them to deal with their problems at the level at which they were originally formulated. At the same time, compatibility of GA with MPI enables the programmer to take advantage of the existing MPI software/libraries when available and appropriate. The variety of applications that have been implemented using Global Arrays attests to the
ChemIO: High-Performance Parallel I/O for Computational Chemistry Applications
- for Computational Chemistry Applications, Intl. J. Supercomp. Apps. High Perf. Comp.12
, 1998
"... Recent developments in I/O systems on scalable parallel computers have sparked renewed interest in out-of-core methods for computational chemistry. These methods can improve execution time significantly relative to "direct" methods, which perform many redundant computations. However, the widespread ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Recent developments in I/O systems on scalable parallel computers have sparked renewed interest in out-of-core methods for computational chemistry. These methods can improve execution time significantly relative to "direct" methods, which perform many redundant computations. However, the widespread use of such out-of-core methods requires efficient and portable implementations of often complex I/O patterns. The ChemIO project has addressed this problem by defining an I/O interface that captures the I/O patterns found in important computational chemistry applications and by providing high-performance implementations of this interface on multiple platforms. This development not only broadens the user community for parallel I/O techniques but also provides new insights into the functionality required in general-purpose scalable I/O libraries and the techniques required to achieve high- performance I/O on scalable parallel computers. 1 Introduction Computational chemistry refers t...
Automatic Classification Of Input/Output Access Patterns
, 1997
"... Despite continued innovations in disk design, input/output performance has not kept pace with concurrent increases in processor speeds. Much research has focused on developing algorithms to avoid input/output or hide input/output latency in an attempt to redress this widening gap. Many studies have ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Despite continued innovations in disk design, input/output performance has not kept pace with concurrent increases in processor speeds. Much research has focused on developing algorithms to avoid input/output or hide input/output latency in an attempt to redress this widening gap. Many studies have shown that with advance knowledge of access patterns, file systems can improve input/output performance by selecting policies appropriate for the resource demands. Unfortunately, access patterns may be complex or data dependent, and therefore unknown a priori. Our thesis is that the file system can automatically detect qualitative file access patterns both locally (per parallel program thread) and globally (per parallel program) and use this information to dynamically choose appropriate file system policies. We propose two complementary methods for automatic classification, based on neural networks and hidden Markov models, respectively. Global classifications are created from a combination...
Automatic Parallel I/O Performance Optimization in Panda
- In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability and usability goals of applications has become a significant challenge. We believe that a parallel I/O system ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability and usability goals of applications has become a significant challenge. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an automatic performance optimization approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects optimal I/O plans automatically without human intervention based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses a rule-based and randomized search-based algorithms to select optimal parameter settings in Panda, a parallel I/O library for multidimensional arr...
Distant I/O: One-Sided Access to Secondary Storage on Remote Processors
"... We propose a new parallel, noncollective I/O strategy called Distant I/O that targets clustered computer systems in which disks are attached to compute nodes. Distant I/O allows one-sided access to remote secondary storage without installing server processes or daemons on remote compute nodes. We im ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We propose a new parallel, noncollective I/O strategy called Distant I/O that targets clustered computer systems in which disks are attached to compute nodes. Distant I/O allows one-sided access to remote secondary storage without installing server processes or daemons on remote compute nodes. We implemented this model using Active Messages and demonstrated its performance advantages over the PIOFS parallel filesystem for an I/O-intensive parallel application on the IBM SP. 1 Introduction Recent advances in low-latency, high-speed network technology coupled with inexpensive commodity processors have greatly improved cost-effectiveness of parallel computing. Disk technology also has been advancing rapidly especially with respect to the storage density, capacity and bandwidth. I/O, on the other hand, remains a major bottleneck in many parallel applications, such as climate modeling, computational chemistry, and computational fluid dynamics. We argue that more flexible communication pro...
Informed prefetching of collective input/output requests
- Proceedings of SC99
, 1999
"... Optimizing collective input/output (I/O) is important for improving throughput of parallel scientific applications. Current research suggests that a specialized collective application programming interface, coupled with system-level optimizations, is necessary to obtain good I/O performance. Unfortu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Optimizing collective input/output (I/O) is important for improving throughput of parallel scientific applications. Current research suggests that a specialized collective application programming interface, coupled with system-level optimizations, is necessary to obtain good I/O performance. Unfortunately, collective interfaces require an application to disclose its entire access pattern to fully reorder I/O requests, and cannot flexibly utilize additional memory to improve performance. In this paper we propose and analyze a method of optimizing collective access patterns using informed prefetching that is capable of exploiting any amount of available memory to overlap I/O with computation. We compare this approach to diskdirected I/O, an efficient implementation of a collective I/O interface. Moreover, we prove that under certain conditions, a per-processor prefetch depth equal to the number of drives can guarantee sequential disk accesses for any collectively accessed file. In empirical studies, a prefetch horizon of one to two times the number of disks per processor is sufficient to match the performance of disk-directed I/O for sequentially allocated files. Finally, we develop accurate analytical models to predict the throughput of informed prefetching for collective reads as a function of the per-processor prefetch depth. 1
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
- in Proceedings of the Workshop on Performance Optimization via High-Level Languages and Libraries (POHLL-02
, 2002
"... This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution
Efficient layout transformation for disk-based multidimensional arrays
- In HiPC. 386–398
, 2004
"... Abstract. I/O libraries such as PANDA and DRA use blocked layouts for efficient access to disk-resident multi-dimensional arrays, with the shape of the blocks being chosen to match the expected access pattern of the array. Sometimes, different applications, or different phases of the same applicatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. I/O libraries such as PANDA and DRA use blocked layouts for efficient access to disk-resident multi-dimensional arrays, with the shape of the blocks being chosen to match the expected access pattern of the array. Sometimes, different applications, or different phases of the same application, have very different access patterns for an array. In such situations, an array’s blocked layout representation must be transformed for efficient access. In this paper, we describe a new approach to solve the layout transformation problem and demonstrate its effectiveness in the context of the Disk Resident Arrays (DRA) library. The approach handles re-blocking and permutation of dimensions. Results are provided that demonstrate the performance benefit as compared to currently available mechanisms. 1
Approved for the Major Department
, 2003
"... committee and by majority vote has been found to be satisfactory. Chair: Steven G. Parker ..."
Abstract
- Add to MetaCart
committee and by majority vote has been found to be satisfactory. Chair: Steven G. Parker

