Results 1 - 10
of
19
The J-Machine Multicomputer: An Architectural Evaluation
- In Proceedings of the 20th Annual International Symposium on Computer Architecture
, 1993
"... The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integrated multicomputer component, the Message-Driven Processor (MDP), and 1 MByte of DRAM. The MDP provi ..."
Abstract
-
Cited by 132 (4 self)
- Add to MetaCart
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integrated multicomputer component, the Message-Driven Processor (MDP), and 1 MByte of DRAM. The MDP provides mechanisms to support efficient communication, synchronization, and naming. A 512 node J-Machine is operational and is due to be expanded to 1024 nodes in March 1993. In this paper we discuss the design of the J-Machine and evaluate the effectiveness of the mechanisms incorporated into the MDP. We measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications. 1 Introduction Over the past 40 years, sequential von Neumann processors have evolved a set of mechanisms appropriate for supporting most sequential programming models. It is clear, however, from efforts to build concurrent machines by connecting man...
Automatic Data Layout for High-Performance Fortran
- IN PROCEEDINGS OF SUPERCOMPUTING '95
, 1994
"... High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet ecient machine independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual step in writing an ec ..."
Abstract
-
Cited by 66 (3 self)
- Add to MetaCart
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet ecient machine independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual step in writing an ecient HPF program. The developers of HPF did not believe that data layouts can be determined automatically in all cases. Therefore HPF requires the user to specify the data layout. It is the task of the HPF compiler to generate ecient code for the user supplied data layout. The choice
Automatic Data Layout Using 0-1 Integer Programming
- In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT94
, 1994
"... : The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. By shifting much of the burden of machine-dependent optimization to the compiler, the programmer is able to write data-parallel programs that ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
: The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. By shifting much of the burden of machine-dependent optimization to the compiler, the programmer is able to write data-parallel programs that can be compiled and executed with good performance on many different architectures. However, the choice of a good data layout is still left to the programmer. Even the most sophisticated compiler may not be able to compensate for a poorly chosen data layout since many compiler decisions are driven by the data layout specified in the program. The choice of a good data layout depends on many factors, including the target machine architecture, the compilation system, the problem size, and the number of processors available. The option of remapping arrays at specific points in the program makes the choice even harder. Current programming tools provide little or no support for this difficult sele...
Optimal Evaluation of Array Expressions on Massively Parallel Machines
- ACM TRANS. PROG. LANG. SYST
, 1992
"... ..."
Automatic Data Layout for Distributed Memory Machines
, 1995
"... The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. This thesis discusses the design and implementation of a data layout selection tool that generates Fortran D or HPF style data layout specifications automatically. Because the tool is not embedded in the target compiler and will be run only a few times during the tuning phase of an application, it can use techniques that may be considered too computationally expensive for inclusion in today's compilers. The proposed framework for automatic data layout s...
Solving Alignment using Elementary Linear Algebra
- IN PROCEEDINGS OF THE 7TH ANNUAL WORKSHOP ON LANAGUAGES AND COMPILERS FOR PARALLEL COMPUTERS
, 1994
"... Data and computation alignment is an important part of compiling sequential programs to architectures with non-uniform memory access times. In this paper, we show that elementary matrix methods can be used to determine communication-free alignment of code and data. We also solve the problem of repli ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Data and computation alignment is an important part of compiling sequential programs to architectures with non-uniform memory access times. In this paper, we show that elementary matrix methods can be used to determine communication-free alignment of code and data. We also solve the problem of replicating read-only data to eliminate communication. Our matrix-based approach leads to algorithms which are simpler and faster than existing algorithms for the alignment problem. 1 Introduction: A key problem in generating code for non-uniform memory access (NUMA) parallel machines is data and computation placement --- that is, determining what work each processor must do, and what data must reside in each local memory. The goal of placement is to exploit parallelism by spreading the work across the processors, and to exploit locality by spreading data so that memory accesses are local whenever possible. The problem of determining a good placement for a program is usually solved in two phases called alignment and distribution.
Mobile and Replicated Alignment of Arrays in Data-Parallel Programs
, 1993
"... When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignmen ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors.
Automatic Data Layout for Distributed-Memory Machine in the D Programming Environment
, 1993
"... Although distributed-memory message-passing parallel computers are among the most costeffective high performance machines available, scientists find them extremely difficult to program. Most programmers feel uncomfortable working with a distributed-memory programming model that requires explicit man ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Although distributed-memory message-passing parallel computers are among the most costeffective high performance machines available, scientists find them extremely difficult to program. Most programmers feel uncomfortable working with a distributed-memory programming model that requires explicit management of local name spaces. To address this problem, researchers have proposed using languages based on a global name space annotated with directives specifying how the data should be mapped onto a distributed memory machine. Using these annotations, a sophisticated compiler can automatically transform a code into a message-passing program suitable for execution on a distributed-memory machine. The Fortran77D and Fortran90D languages support this programming style. Given a Fortran D program, the compiler uses data layout directives to automatically generate a single-program, multiple data (SPMD) node program for a given distributed-memory target machine.
Aligning parallel arrays to reduce communication
- In Frontiers '95: The 5th Symp. on the Frontiers of Massively Parallel Computation
, 1995
"... Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-co ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give aheuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM-5. 1
Automatic Alignment of Array Data and Processes To Reduce Communication Time on DMPPs
, 1995
"... This paper investigates the problem of aligning data and processes in a distributed-memory implementation. We present complete algorithms for compile-time analysis, the necessary program restructuring, and subsequent code-generation, and discuss their complexity. We finally evaluate the practical us ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper investigates the problem of aligning data and processes in a distributed-memory implementation. We present complete algorithms for compile-time analysis, the necessary program restructuring, and subsequent code-generation, and discuss their complexity. We finally evaluate the practical usefulness by quantitative experimentation. The technique presented analyzes complete programs, including branches, loops, and nested parallelism. Alignment is determined with respect to offset, stride, and general axis relations. Both placement of data and processes are computed in a unifying framework based on an extended preference graph and its analysis. Furthermore, dynamic redistribution and replication are considered in the same technique. The experimental results are very encouraging. The optimization algorithms implemented in the Modula-2* compiler improved the execution times of the programs by over 40% on a MasPar MP-1 with 16384 processors. This paper appeared in: Proceedings of th...

