Results 1  10
of
90
ObjectOriented Parallel Processing with Mentat
 IEEE Computer
, 1996
"... this paper is to provide the reader with a solid introduction to Mentat and to provide intuition as to the performance that can be expected from Mentat applications. This will be accomplished by first examining the Mentat philosophy to parallel computing and reviewing the Mentat programming language ..."
Abstract

Cited by 103 (5 self)
 Add to MetaCart
this paper is to provide the reader with a solid introduction to Mentat and to provide intuition as to the performance that can be expected from Mentat applications. This will be accomplished by first examining the Mentat philosophy to parallel computing and reviewing the Mentat programming language basics. We will then move onto applications performance. For each of several applications we will address two questions. 1) What is the shape of the Mentat solution? 2) How did the implementation perform? We conclude with a retrospective and a look forward to our next system, Legion
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
A Framework for Unifying Reordering Transformations
, 1993
"... We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iterat ..."
Abstract

Cited by 72 (10 self)
 Add to MetaCart
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. This work is supported by an NSF PYI grant CCR9157384 and by a Packard Fellowship. 1 Introduction Optimizing compilers reorder iterations of statements to improve instruction scheduling, register use, and cache utilization, and to expose parallelism. Many different reordering transformations have been developed and studied, su...
Customized Dynamic Load Balancing for a Network of Workstations
, 1997
"... this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, applicationdriven customized dynamic load balancing becomes essential for good performance. We present a hybrid compiletime and runtime modeling and ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, applicationdriven customized dynamic load balancing becomes essential for good performance. We present a hybrid compiletime and runtime modeling and decision process which selects (customizes) the best scheme, along with automatic generation of parallel code with calls to a runtime library for load balancing. 1997 Academic Press 1.
Scan Primitives for Vector Computers
 In Proceedings Supercomputing '90
, 1990
"... This paper describes an optimized implementation of a set of scan (also called allprefix sums) primitives on a single processor of a CRAY YMP, and demonstrates that their use leads to greatly improved performance for several applications that cannot be vectorized with existing compiler technology. ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
This paper describes an optimized implementation of a set of scan (also called allprefix sums) primitives on a single processor of a CRAY YMP, and demonstrates that their use leads to greatly improved performance for several applications that cannot be vectorized with existing compiler technology. The algorithm used to implement the scans is based on an algorithm for parallel computers and is applicable with minor modifications to any registerbased vector computer. On the CRAY YMP, the asymptotic running time of the plusscan is about 2.25 times that of a vector add, and is within 20% of optimal. An important aspect of our implementation is that a set of segmented versions of these scans are only marginally more expensive than the unsegmented versions. These segmented versions can be used to execute a scan on multiple data sets without having to pay the vector startup cost (n 1=2 ) for each set. The paper describes a radix sorting routine based on the scans that is 13 times faster ...
Compiletime Scheduling Algorithms for Heterogeneous Network of Workstations
 THE COMPUTER JOURNAL
, 1997
"... In this paper, we study the problem of scheduling parallel loops at compiletime for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
In this paper, we study the problem of scheduling parallel loops at compiletime for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount of work in each iteration; heterogeneous processors have different speeds; heterogeneous memory refers to the different amount of useravailable memory on the machines; and a heterogeneous network has different cost of communication between processors. We propose a simple yet comprehensive model for use in compiling for a network of processors, and develop compiler algorithms for generating optimal and
Linear Scheduling Is Nearly Optimal
, 1991
"... This paper deals with the problem of finding optimal schedulings for uniform dependence algorithms. Given a convex domain, let T f be the total time needed to execute all computations using the free (greedy) schedule and let T l be the total time needed to execute all computations using the optimal ..."
Abstract

Cited by 35 (11 self)
 Add to MetaCart
This paper deals with the problem of finding optimal schedulings for uniform dependence algorithms. Given a convex domain, let T f be the total time needed to execute all computations using the free (greedy) schedule and let T l be the total time needed to execute all computations using the optimal linear schedule. Our main result is to bound T l =T f and T l \Gamma T f for sufficiently "fat" domains. Keywords: Uniform dependence algorithms; Convex domain; Free schedule; Linear schedule; Optimal schedule; Path packing. 1. Introduction The pioneering work of Karp, Miller and Winograd 2 has considered a special class of algorithms characterized by uniform data dependencies and unittime computations. This special class of algorithms, termed uniform dependence algorithms by Shang and Fortes 6 has proven of paramount importance in various fields of applications, such as systolic array design and parallel compiler optimization. This paper deals with the problem of finding optimal s...
Finding Legal Reordering Transformations using Mappings
 In Seventh International Workshop on Languages and Compilers for Parallel Computing
"... Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it ha ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it hard to analyze or predict the effects of compositions of these transformations. To overcome these problems we have developed a framework for unifying iteration reordering transformations. The framework is based on the idea that all reordering transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. An optimizing compiler would use our framework by finding a mapping that both corresponds to a legal transformation and produces efficient code. We present the mapping selection problem as a search problem by decomposing it into a sequence of smal...
Clustering Task Graphs for Message Passing Architectures
 Proceedings of ACM International Conference on Supercomputing
, 1990
"... Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are prese ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are presented. For every nonlinear clustering there exists a linear clustering whose parallel time is less than the nonlinear one. Furthermore, the parallel time of any linear clustering is within a factor of two of the optimal. Two clustering algorithms are presented with near linear time complexity for coarse grain DAGs. The conclusion is that linear clustering is an efficient and accurate operation. 1 Introduction Identification of parallelism, partitioning, clustering and scheduling are some of the major problems in parallel processing. Partitioning is used as a first step to scheduling and is defined as a mapping of the nodes of a data dependence graph (DDG) onto labeled tasks. The definition o...