Results 1 - 10
of
32
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 217 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
The Galley parallel file system
- Parallel Computing
, 1996
"... Most current multiprocessor le systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scienti c applications. Many multiprocessor le systems provide applications with a conventional Unix-like interface, allowing the ..."
Abstract
-
Cited by 127 (8 self)
- Add to MetaCart
Most current multiprocessor le systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scienti c applications. Many multiprocessor le systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. Thisinterface conceals the parallelism within the le system, increasing the ease of programmability, but making it di cult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insu cient interface, most current multiprocessor le systems are optimized for a di erent workload than they are being asked to support. We introduce Galley, a new parallel le system that is intended to e ciently support realistic scienti c multiprocessor workloads. We discuss Galley's le structure and application interface, as well as the performance advantages o ered by that interface. 1
Infrastructure for Building Parallel Database Systems for Multi-dimensional Data
, 1999
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the proc ..."
Abstract
-
Cited by 39 (26 self)
- Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data r...
Object-relational Queries into Multidimensional Databases with the Active Data Repository
, 1999
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important charac ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important characteristics: datasets consist of complex data and are usually multi-dimensional; applications usually retrieve a subset of all the data available in the dataset; various applicationspecific operations are performed on the data items retrieved. Such applications can be supported by object-relational database management systems (OR-DBMSs). In addition to providing functionality to define new complex datatypes and user-defined functions, an OR-DBMS for scientific datasets should contain runtime support that will provide optimized storage for very large datasets and an execution environment for user-defined functions involving expensive operations. In this paper we describe an infrastructure, the ...
Optimizing Retrieval and Processing of Multi-dimensional Scientific Datasets
, 2000
"... Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output da...
Lightweight I/O for scientific applications
, 2006
"... Today’s high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one hundred thousand processors. For systems of such scale, efficient I/O is a significant challenge that cannot be solved using ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Today’s high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one hundred thousand processors. For systems of such scale, efficient I/O is a significant challenge that cannot be solved using traditional approaches. In particular, general purpose parallel file systems that limit applications to standard interfaces and access policies do not scale and will likely be a performance bottleneck for many scientific applications. In this paper, we investigate the use of a “lightweight” approach to I/O that requires the application or I/O-library developer to extend a core set of critical I/O functionality with the minimum set of features and services required by its target applications. We argue that this approach allows the development of I/O libraries that are both scalable and secure. We support our claims with preliminary results for a lightweight checkpoint operation on a development cluster at Sandia. 1
Database support for data-driven scientific applications
- in the grid. Parallel Processing Letters
, 2003
"... krishnan,kurc,umit,jsaltz¢ In this paper we describe a services oriented software system to provide basic database support for efficient execution of applications that make use of scientific datasets in the Grid. This system supports two core operations: efficient selection of the data of interest f ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
krishnan,kurc,umit,jsaltz¢ In this paper we describe a services oriented software system to provide basic database support for efficient execution of applications that make use of scientific datasets in the Grid. This system supports two core operations: efficient selection of the data of interest from distributed databases and efficient transfer of data from storage nodes to compute nodes for processing. We present its overall architecture and main components and describe preliminary experimental results. 1
Optimizing Collective I/O Performance on Parallel Computers: A Multisystem Study
- In Proceedings of the 11th ACM International Conference on Supercomputing
, 1997
"... While individual parallel I/O systems can incorporate sophisticated techniques and achieve impressive performance in particular situations, researchers as yet have only limited understanding of the impact of various design decisions or of the techniques required for performance robustness. One remed ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
While individual parallel I/O systems can incorporate sophisticated techniques and achieve impressive performance in particular situations, researchers as yet have only limited understanding of the impact of various design decisions or of the techniques required for performance robustness. One remedy is to perform detailed comparative studies of different I/O libraries. In this paper, we describe such a study for the Disk Resident Array and Panda libraries, both designed to support high-performance I/O for arrays. While the two systems have many similarities, their designs and implementations are based on different assumptions and target different applications. We base our study on two I/O structures commonly encountered in scientific applications: the collective read/write of an entire array and the collective read/write of an arbitrary array section. Experiments are performed on two parallel file systems (IBM PIOFS and Intel PFS) and one commodity Unix file system (AIX JFS). Our resu...
Supporting Efficient Noncontiguous Access in PVFS over InfiniBand
- In Proceedings of Cluster Computing ’03, Hong Kong
, 2003
"... Noncontiguous I/O access is the main access pattern in many scientific applications. Noncontiguity exists both in access to files and in access to target memory regions on the client. This characteristic imposes a requirement of native noncontiguous I/O access support in cluster file systems for hig ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Noncontiguous I/O access is the main access pattern in many scientific applications. Noncontiguity exists both in access to files and in access to target memory regions on the client. This characteristic imposes a requirement of native noncontiguous I/O access support in cluster file systems for high performance. In this paper, we address two main issues on supporting efficient noncontiguous I/O access in cluster file systems over a high performance network. One is noncontiguous data transmission between the client and the I/O server. The second is noncontiguous disk access on the I/O server itself.
Improving collective I/O performance using threads
- In Proceedings of the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
, 1999
"... Abstract Massively parallel computers are increasingly being used to solve large, I/O intensive applications in many different fields. For such applications, the I/O requirements quite often present a significant obstacle in the way of achieving good performance, and an important area of current res ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Abstract Massively parallel computers are increasingly being used to solve large, I/O intensive applications in many different fields. For such applications, the I/O requirements quite often present a significant obstacle in the way of achieving good performance, and an important area of current research is the development of techniques by which these costs can be reduced. One such approach is collective I/O, where the processors cooperatively develop an I/O strategy that reduces the number, and increases the size, of I/O requests, making a much better use of the I/O subsystem. Collective I/O has been shown to significantly reduce the cost of performing I/O in many large, parallel applications, and for this reason serves as an important base upon which we can explore other mechanisms which can further reduce these costs. One promising approach is to use threads to perform the collective I/O in the background while the main thread continues with other computation in the foreground.

