• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A highperformance, portable implementation of the MPI message passing interface standard. Parallel Computing 22(6):789–828 (1996)

by W Gropp, E Lusk, N Doss, A Skjellum
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 890
Next 10 →

Globus: A Metacomputing Infrastructure Toolkit

by Ian Foster, Carl Kesselman - International Journal of Supercomputer Applications , 1996
"... Emerging high-performance applications require the ability to exploit diverse, geographically distributed resources. These applications use high-speed networks to integrate supercomputers, large databases, archival storage devices, advanced visualization devices, and/or scientific instruments to for ..."
Abstract - Cited by 1929 (51 self) - Add to MetaCart
Emerging high-performance applications require the ability to exploit diverse, geographically distributed resources. These applications use high-speed networks to integrate supercomputers, large databases, archival storage devices, advanced visualization devices, and/or scientific instruments to form networked virtual supercomputers or metacomputers. While the physical infrastructure to build such systems is becoming widespread, the heterogeneous and dynamic nature of the metacomputing environment poses new challenges for developers of system software, parallel tools, and applications. In this article, we introduce Globus, a system that we are developing to address these challenges. The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and network. A low-level toolkit provides basic mechanisms such as communication, authentication, network information, and data access. These mechanisms are used to construct various higher-level metacomp...

PVFS: A Parallel File System For Linux Clusters

by Philip H. Carns, Walter B. Ligon, III, Robert B. Ross, Rajeev Thakur - IN PROCEEDINGS OF THE 4TH ANNUAL LINUX SHOWCASE AND CONFERENCE , 2000
"... As Linux clusters have matured as platforms for low-cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and networking. One area devoid of support, however, has been parallel file systems, which are criti ..."
Abstract - Cited by 425 (34 self) - Add to MetaCart
As Linux clusters have matured as platforms for low-cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and networking. One area devoid of support, however, has been parallel file systems, which are critical for high-performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further research in parallel I/O and parallel file systems for Linux clusters. In this paper, we describe the design and implementation of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.

The Globus Project: A Status Report

by Ian Foster, Carl Kesselman , 1998
"... The Globus project is a multi-institutional research e#ort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational gri ..."
Abstract - Cited by 343 (20 self) - Add to MetaCart
The Globus project is a multi-institutional research e#ort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future highperformance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. In this paper, we report on the status of the Globus project as of early 1998. We describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. We also discuss on the Globus Ubiquitous Supercomputing Testbed (GUSTO) that we have constructed to enable largescale evaluation of Globus technologies, and review early exp...
(Show Context)

Citation Context

...kets, while preserving for the programmer a high degree of control over how and when communication occurs. Globus services have been used to develop a grid-enabled MPI [10] based on the MPICH library =-=[20]-=-, with Nexus used for communication, GRAM services for resource allocation, and GSI services for authentication. The result is a system that allows programmers to use simple, standard commands to run ...

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

by Nicholas T. Karonis, Brian Toonen, Ian Foster , 2002
"... ..."
Abstract - Cited by 304 (14 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...s, we have developed MPICH-G2, a complete implementation of the MPI-1 standard [42] that uses services provided by the Globus Toolkit TM [17] to extend the popular Argonne MPICH implementation of MPI =-=[27]-=- for Grid execution. MPICH-G2 passes the MPICH test suite and represents a complete redesign and reimplementation of the earlier MPICH-G system [15] that increases performance significantly and incorp...

A Directory Service for Configuring High-Performance Distributed Computations

by Steven Fitzgerald, Ian Foster, Carl Kesselman, Gregor von Laszewski, Warren Smith, Steven Tuecke , 1997
"... High-performance execution in distributed computing environments often requires careful selection and configuration not only of computers, networks, and other resources but also of the protocols and algorithms used by applications. Selection and configuration in turn require access to accurate, up-t ..."
Abstract - Cited by 282 (55 self) - Add to MetaCart
High-performance execution in distributed computing environments often requires careful selection and configuration not only of computers, networks, and other resources but also of the protocols and algorithms used by applications. Selection and configuration in turn require access to accurate, up-to-date information on the structure and state of available resources. Unfortunately, no standard mechanism exists for organizing or accessing such information. Consequently, different tools and applications adopt ad hoc mechanisms, or they compromise their portability and performance by using default configurations. We propose a solution to this problem: a Metacomputing Directory Service that provides efficient and scalable access to diverse, dynamic, and distributed information about resource structure and state. We define an extensible data model to represent the information required for distributed computing, and we present a scalable, high-performance, distributed implementation. The dat...
(Show Context)

Citation Context

... other systems that support computing in distributed environments, such as Legion [12], NEOS [5], NetSolve [4], Condor [16], Nimrod [1], PRM [18], AppLeS [2], and heterogeneous implementations of MPI =-=[13]-=-. The principal contributions of this article are a new architecture for high-performance distributed computing systems, based upon an information service called the Metacomputing Directory Service; a...

The Nexus approach to integrating multithreading and communication

by Ian Foster, Carl Kesselman, Steven Tuecke - Journal of Parallel and Distributed Computing , 1996
"... Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory access, and to support task-parallel and irregular applications. In this paper, we address the question o ..."
Abstract - Cited by 225 (33 self) - Add to MetaCart
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory access, and to support task-parallel and irregular applications. In this paper, we address the question of how to integrate threads and communication in high-performance distributed-memory systems. We propose an approach based on global pointer and remote service request mechanisms, and explain how these mechanisms support dynamic communication structures, asynchronous messaging, dynamic thread creation and destruction, and a global memory model via interprocessor references. We also explain how these mechanisms can be implemented in various environments. Our global pointer and remote service request mechanisms have been incorporated in a runtime system called Nexus that is used as a compiler target for parallel languages and as a substrate for higher-level communication libraries. We report the results of performance studies conducted using a Nexus implementation; these results indicate that Nexus mechanisms can be implemented efficiently on commodity hardware and software systems. 1
(Show Context)

Citation Context

...ervice request mechanisms have been incorporated into a multithreaded communication library called Nexus [26], which we and others have used to build a variety of higher-level communication libraries =-=[30, 27, 40]-=- and to implement several parallel languages [10, 39, 25]. We use a Nexus implementation to perform detailed performance studies of our proposed communication mechanisms on several parallel platforms....

Wide-coverage efficient statistical parsing with CCG and log-linear models

by Stephen Clark, James R. Curran - COMPUTATIONAL LINGUISTICS , 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Dis ..."
Abstract - Cited by 218 (43 self) - Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
(Show Context)

Citation Context

...reduction in estimation time: using 18 nodes allows our best-performing model to be estimated in less than three hours. We use the the Message Passing Interface (MPI) standard for the implementation (=-=Gropp et al. 1996-=-). The parallel implementation is a straightforward extension of the BFGS algorithm. Each machine in the cluster deals with a subset of the training data, holding the packed charts for that subset in ...

On Implementing MPI-IO Portably and with High Performance

by Rajeev Thakur, William Gropp, Ewing Lusk - In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems , 1999
"... We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portabl ..."
Abstract - Cited by 196 (20 self) - Add to MetaCart
We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocatean implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocati...

BIP: a new protocol designed for high performance networking on Myrinet

by Loic Prylli, Bernard Tourancheau - In Workshop PC-NOW, IPPS/SPDP98 , 1998
"... Abstract. High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the cu ..."
Abstract - Cited by 182 (10 self) - Add to MetaCart
Abstract. High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the current network technology, the main bottleneck is most of the time the software that makes the interface between the hardware and the user. We designed and implemented new protocols of transmission targeted to parallel computing that squeeze the most out of the high speed Myrinet network, without wasting time in system calls or memory copies, giving all the speed to the applications. This design is presented here as well as experimental results that lead to achieve real Gigabit/s throughput and less than 5 s latency on a cluster of PC workstations, with this a ordable network hardware. Moreover, our networking results compare favorably with the expensive parallel computers or ATM LANs. 1
(Show Context)

Citation Context

...dwidth N 1 2 ( s) (Mbytes/s) (bytes) ATM155/sparc5 (AAL5)[PT97,Pry96] 500 10 10000 iPSC/860 (NX))[DD95] 65 3 340 TMC CM-5 (CMMD)[DD95] 95 9 962 Intel Paragon (NX)[DD95] 29 154 7236 Intel Paragon (MPI)=-=[GLDS96]-=- 40 70 7000 Meiko CS2)[DD95] 83 43 3559 IBM SP-2 (MPI)[DD95] 35 35 3262 T3D (shmem)[DD95] 3 128 363 T3D (MPI)[GLDS96] 21 100 7000 SGI Power Challenge (MPI)[GLDS96]) 47 55 5000 Myrinet/Ppro200 (BIP) 4....

GASS: A Data Movement and Access Service for Wide Area Computing Systems

by Joseph Bester, Ian Foster, Carl Kesselman, Jean Tedesco, Steven Tuecke - PROCEEDINGS OF THE SIXTH WORKSHOP ON I/O IN PARALLEL AND DISTRIBUTED SYSTEMS , 1999
"... In wide area computing, programs frequently execute at sites that are distant from their data. Data access mechanisms are required that place limited functionality demands on an application or host system yet permit high-performance implementations. To address these requirements, we propose a data m ..."
Abstract - Cited by 181 (11 self) - Add to MetaCart
In wide area computing, programs frequently execute at sites that are distant from their data. Data access mechanisms are required that place limited functionality demands on an application or host system yet permit high-performance implementations. To address these requirements, we propose a data movement and access service called Global Access to Secondary Storage (GASS). This service defines a global name space via Uniform Resource Locators and allows applications to access remote files via standard I/O interfaces. High performance is achieved by incorporating default data movement strategies that are specialized for I/O patterns common in wide area applications and by providing support for programmer management of data movement. GASS forms part of the Globus toolkit, a set of services for high-performance distributed computing. GASS itself makes use of Globus services for security and communication, and other Globus components use GASS services for executable staging and real-time remote monitoring. Application experiences demonstrate that the library has practical utility.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2016 The Pennsylvania State University