## An Approach to Machine-Independent Parallel Programming (1994)

Venue: | In LNCS 854, Parallel Processing: CONPAR'94--VAPP VI |

Citations: | 15 - 7 self |

### BibTeX

@INPROCEEDINGS{Zimmermann94anapproach,

author = {Wolf Zimmermann and Welf Löwe},

title = {An Approach to Machine-Independent Parallel Programming},

booktitle = {In LNCS 854, Parallel Processing: CONPAR'94--VAPP VI},

year = {1994},

pages = {277--288},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

. Currently, many parallel algorithms are defined for sharedmemory architectures. The prefered machine model for designing these algorithms is the PRAM. However, this model does not take into account properties of existing architectures. Recently, Culler et al. defined the LogP machine model which better reflects the behaviour of massively parallel computers. We discuss an important class of programs for sharedmemory architectures and show how they can be mapped to the LogP machine. We define this class and show how to compute the mapping at compile time. For this mapping a constant factor delay with respect to the optimal LogP execution time can be guaranteed. 1 Introduction The PRAM model consists of a shared memory and a number of processors which have a local memory.Processors only communicate via their shared memory. The computation steps are performed in a synchronous lock-step manner. Memory access to different memory locations can be performed at the same time. The PRAMs are d...

### Citations

497 | Eicken. Logp: Towards a realistic model of parallel computation - Culler, Karp, et al. - 1993 |

425 |
Supercompilers for Parallel and Vector Computers
- Zima, Chapman
- 1990
(Show Context)
Citation Context ...here remains one loop or one recursive procedure call, the compiler classifies the program as non-oblivious. With the exception of recursion elimination all the transformations are standard, see e.g. =-=[Zim91]-=-. For nonrecursive procedures and functions, procedure inlining is no problem. Hence, when all recursions can be eliminated, then procedure inlining is no problem. Sometimes it is not possible to stat... |

202 |
General Purpose Parallel Architectures, Chapter 18 of Handbook of Theoretical
- Valiant
- 1990
(Show Context)
Citation Context ...though ? In Parallel Programming: CONPAR 94-VAPP VI, Lecture Notes in Computer Science 854, pp. 277--288, 1994 2 It is possible to extend our results to CRCW-PRAM. theoretically optimal results exist =-=[Val90]-=-. The reason is the expensive synchronization, communication latency, communication overhead, and network bandwith. In the LogP machine [CKP + 93], these communication costs are taken into account. Ho... |

136 |
Towards an architectureindependent analysis of parallel algorithms
- Papadimitriou, Yannakakis
- 1990
(Show Context)
Citation Context ...dered. Therefore we can effectively implement any oblivous PRAM program as an optimal LogP program w.r.t. the same communication structure. However, the transformations themselves are exponential. In =-=[PY90]-=- Papadimitrou and Yannakakis showed that finding an optimal clustering is NP-hard, even if o = g = 0 and P = 1. We can therefore not expect to find an efficient and optimal transformation. They also s... |

103 |
Parallel algorithms for shared memory machines
- Karp, Ramachandran
- 1990
(Show Context)
Citation Context ...is allowed in a CRCW-PRAM (concurrent read, concurrent write). In our paper we exclude concurrent writes 2 . Most parallel algorithms are designed for flavors of the PRAM models. For an overview, see =-=[KR90]-=-. The model has been successfully applied, because it allows to focus on the potential parallelism of the problem at hand. In particular, there is no need to consider a network topology and a memory d... |

98 | On the granularity and clustering of directed acyclic task graphs
- Gerasoulis, Yang
- 1993
(Show Context)
Citation Context ...hat correspond to vertices of a path in the communication structure are clustered, in a non-linear clustering at least two tasks that correspond to vertices of different pathes are in one cluster. In =-=[GY93]-=- the granularity of a communication structure is defined as the quotient of the minimal computation of a vertex and the latency. It is proven that linear clustering leeds to a solution with TIME clust... |

38 |
The expected advantage of asynchrony
- Cole, Zajicek
- 1990
(Show Context)
Citation Context ...ams. However, for correct execution, the PRAM model has to be simulated which is too expensive because of the large number of synchronizations. Some of these algorithms can be executed asynchronously =-=[CZ90]-=-, but there is no general technique to prove the correctness of the asynchronous execution of non-oblivious parallel programs. One issue for further research should therefore be to identify non-oblivi... |

15 | The Automatic Complexity Analysis of Divide-and-Conquer Programs - Zimmermann, Zimmermann - 1989 |

9 | The average case analysis of algorithms - Flajolet, Sedgewick - 1994 |

7 | Automatische Komplexitatsanalyse funktionaler Programme. PhD thesis, Fakultat fur Informatik der Universitat Karlsruhe - Zimmermann - 1990 |

6 | The automatic worst case analysis of parallel programs: Simple parallel sorting and algorithms on graphs - Zimmermann - 1991 |

4 | On the implementation of virtual shared memory
- Zimmermann, Kumm
- 1993
(Show Context)
Citation Context ...easons the model is often chosen to design parallel algorithms and programs. On the other hand, almost all parallel computers and local area networks are distributed memory architectures. As shown in =-=[ZK93]-=- implementations of the PRAM model on real parallel machines are practically expensive, although ? In Parallel Programming: CONPAR 94-VAPP VI, Lecture Notes in Computer Science 854, pp. 277--288, 1994... |