Results 1 - 10
of
17
Input/Output Characteristics of Scalable Parallel Applications
- In Proceedings of the Supercomputing ’95
, 1995
"... Rapid increases in computing and comm unication performance are exacerbating the long-standing problem of performance-limited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output syste ..."
Abstract
-
Cited by 100 (2 self)
- Add to MetaCart
Rapid increases in computing and comm unication performance are exacerbating the long-standing problem of performance-limited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output systems depends critically on the input/output requirements and access patterns for this emerging class of large-scale parallel applications. Ho wever, hard data on the behavior of such applications is only now becoming available. In this paper, we describe the input/output requirements of three scalable parallel applications (electron scattering, terrain rendering, and quantum chemistry) on the Intel Paragon XP/S. As part of an ongoing parallel input/output characterization e ort, we used instrumented versions of the application codes to capture
Long Term Distributed File Reference Tracing: Implementation and Experience
, 1994
"... DFSTrace is a system to collect and analyze long-term file reference data in a distributed UNIX workstation environment. The design of DFSTrace is unique in that it pays particular attention to efficiency, extensibility, and the logistics of long-term trace data collection in a distributed environme ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
DFSTrace is a system to collect and analyze long-term file reference data in a distributed UNIX workstation environment. The design of DFSTrace is unique in that it pays particular attention to efficiency, extensibility, and the logistics of long-term trace data collection in a distributed environment. The components of DFSTrace are a set of kernel hooks, a kernel buffer mechanism, a data extraction agent, a set of collection servers, and post-processing tools. Our experience with DFSTrace has been highly positive. Tracing has been virtually unnoticeable, degrading performance 3-7%, depending on the level of detail of tracing. We have collected file reference traces from approximately 30 workstations continuously for over two years. We have implemented a post-processing library to provide a convenient programmer interface to the traces, and have created an on-line database of results from a suite of analysis programs to aid trace selection. Our data has been used for a wide variety of purposes, including file system studies, performance measurement and tuning, and debugging. Extensions of DFSTrace have enabled its use in applications such as field reliability testing and determining disk geometry. This paper presents the design, implementation, and evaluation of DFSTrace and associated tools, and describes how they have been used.
Characterizing parallel file-access patterns on a large-scale multiprocessor
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwi ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
Rapid increases in the computational speeds of multiprocessors have not been matched by correspond-ing performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-v01ume data transfer between tth I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected " workload. So far there have been no-comprehensive usage studies of multiprocessor file systems. Our _. CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at.. _i ",-' NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First, the file system should support efficient concurrent access to many files, and UO requests from many jobs
RAMA: An easy-to-use, high-performance parallel file system
- PARALLEL COMPUTING
, 1997
"... Modem massively parallel file systems provide high bandwidth file access by striping files across arrays of disks attached to a few specialized I/O nodes. However, these file systems are hard to use and difficult to integrate with workstations and tertiary storage. RAMA addresses these problems by p ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Modem massively parallel file systems provide high bandwidth file access by striping files across arrays of disks attached to a few specialized I/O nodes. However, these file systems are hard to use and difficult to integrate with workstations and tertiary storage. RAMA addresses these problems by providing a high-performance massively parallel file system with a simple interface. RAMA uses hashing to pseudo-randomly distribute data to all of its disks, insuring high bandwidth regardless of access pattern and eliminating bottlenecks in file block accesses. This flexibility does not cause a large loss of performance -- RAMA's simulated performance is within 10-15% of the optimum performance of a similarly-sized striped file system, and is a factor of 4 or more better than a striped file system with poorly laid out data.
I/O Characterization of a Portable Astrophysics Application on the IBM SP and Intel Paragon
, 1995
"... Many large-scale applications on parallel machines are bottlenecked by the I/O performance rather than the CPU or communication performance of the system. To improve the I/O performance, it is first necessary for system designers to understand the I/O requirements of various applications. This pa ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Many large-scale applications on parallel machines are bottlenecked by the I/O performance rather than the CPU or communication performance of the system. To improve the I/O performance, it is first necessary for system designers to understand the I/O requirements of various applications. This paper presents the results of a study of the I/O characteristics and performance of a real, I/O-intensive, portable, parallel application in astrophysics, on two different parallel machines---the IBM SP and the Intel Paragon. We instrumented the source code to record all I/O activity, and analyzed the resulting trace files. Our results show that, for this application, the I/O consists of fairly large writes, and writing data to files is faster on the Paragon, whereas opening and closing files are faster on the SP. We also discuss how the I/O performance of this application could be improved; particularly, we believe that this application would benefit from using collective I/O.
A Comparison of Logical and Physical Parallel I/O Patterns
- International Journal of High Performance Computing Applications
, 1998
"... Although there are several extant studies of parallel scientific application request patterns, there is little experimental data on the correlation of physical input/output patterns with application input/output stimuli. To understand these correlations, we have instrumented the SCSI device drivers ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Although there are several extant studies of parallel scientific application request patterns, there is little experimental data on the correlation of physical input/output patterns with application input/output stimuli. To understand these correlations, we have instrumented the SCSI device drivers of the Intel Paragon OSF/1 operating system to record key physical input/output activities and have correlated this data with the input/output patterns of scientific applications captured via the Pablo analysis toolkit. Our analysis shows that disk hardware features profoundly affect the distribution of request delays and that current parallel file systems respond to parallel application input/output patterns in non-scalable ways. 1 Introduction Input/output for scalable parallel systems continues to be the major performance bottleneck for many large-scale scientific applications [2, 14]. Market forces are increasing the disparity between processor and disk system performance, exacerbating ...
Long-Term File Activity Patterns in a UNIX Workstation Environment
- FIFTEENTH IEEE SYMPOSIUM ON MASS STORAGE SYSTEMS
, 1998
"... As mass storage technology becomes more affordable for sites smaller than supercomputer centers, understanding their file access patterns becomes crucial for developing systems to store rarely used data on tertiary storage devices such as tapes and optical disks. This paper presents a new way to ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
As mass storage technology becomes more affordable for sites smaller than supercomputer centers, understanding their file access patterns becomes crucial for developing systems to store rarely used data on tertiary storage devices such as tapes and optical disks. This paper presents a new way to collect and analyze file system statistics for UNIX-based file systems. The collection system runs in user-space and requires no modification of the operating system kernel. The statistics package provides details about file system operations at the file level: creations, deletions, modifications, etc.. The paper
RAMA: Easy Access to a High-Bandwidth Massively Parallel File System
- USENIX 95
, 1995
"... Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively parallel processor (MPP). This arrangement requires programmers t ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively parallel processor (MPP). This arrangement requires programmers to give the file system many hints on how their data is to be laid out on disk if they want to achieve good performance. Additionally, the custom interface makes massively parallel file systems hard for programmers to use and difficult to seamlessly integrate into an environment with workstations and tertiary storage. The RAMA file system addresses these problems by providing a massively parallel file system that does not need user hints to provide good performance. RAMA takes advantage of the recent decrease in physical disk size by assuming that each processor in an MPP has one or more disks attached to it. Hashing is then used to pseudo-randomly distribute data to all of these disks, insuring high bandwidth regardless of access pattern. Since MPP programs often have many nodes accessing a single file in parallel, the file system must allow access to different parts of the file without relying on a particular node. In RAMA, a file request involves only two nodes — the node making the request and the node on whose disk the data is stored. Thus, RAMA scales well to hundreds of processors. Since RAMA needs no layout hints from applications, it fits well into systems where users cannot (or will not) provide such hints. Fortunately, this flexibility does not cause a large loss of performance. RAMA’s simulated performance is within 10-15 % of the optimum performance of a similarly-sized striped file system, and is a factor of 4 or more better than a striped file system with poorly laid out data.
Long-term File Activity and Inter-Reference Patterns
, 1998
"... This paper is organized into nine sections. We begin by reviewing previous disk activity studies in Section 2. In Section 3, we briefly discuss our data collection and analysis tools, which differ significantly from those used in earlier studies. We describe the different types of computing environm ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper is organized into nine sections. We begin by reviewing previous disk activity studies in Section 2. In Section 3, we briefly discuss our data collection and analysis tools, which differ significantly from those used in earlier studies. We describe the different types of computing environments from which we collected data in Section 4. The software written for this paper analyzes the collected data and generates statistics. The simplest analysis mode provides information about daily activity. This is shown in Section 5. Analysis of long-term trends is shown in Section 6. An interesting product from this research is a comparison of the same file system's activity from either the file name view, or from the operating system's underlying numeric index. This comparison is done in Section 7. We summarize our findings in Section 8 and briefly discuss our future research in Section 9.

