Results 11  20
of
78
The Mentat Computation Model  DataDriven Support for Dynamic ObjectOriented Parallel Processing
, 1993
"... Mentat is an objectoriented parallel processing system developed at the University of Virginia which has been ported to a variety of MIMD architectures. The computation model employed by Mentat is macro dataflow (MDF), a medium grain, scalable, datadriven computation model that supports both high ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Mentat is an objectoriented parallel processing system developed at the University of Virginia which has been ported to a variety of MIMD architectures. The computation model employed by Mentat is macro dataflow (MDF), a medium grain, scalable, datadriven computation model that supports both high degrees of parallelism and the objectoriented paradigm. A key aspect of the model is that it can be efficiently implemented. Inspired by dataflow, MDF retains the graphbased, datadriven, selfsynchronizing aspects of dataflow. MDF address the shortcomings that dataflow exhibits when applied to distributed memory MIMD architectures by extending dataflow in three ways: (1) it is medium grain  actors are of sufficient computational complexity to amortize overhead costs, (2) program graphs are dynamically constructed at runtime  this permits dynamic function binding as required by the objectoriented paradigm and increases the average computation granularity, and (3) actors may maintai...
Empirical Analysis of Overheads in Cluster Environments
 CONCURRENCY: PRACTICE AND EXPERIENCE
, 1995
"... In concurrent computing environments that are based on heterogeneous processing elements interconnected by generalpurpose networks, several classes of overheads contribute to lowered performance. In an attempt to gain a deeper insight into the exact nature of these overheads, and to develop stra ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
In concurrent computing environments that are based on heterogeneous processing elements interconnected by generalpurpose networks, several classes of overheads contribute to lowered performance. In an attempt to gain a deeper insight into the exact nature of these overheads, and to develop strategies to alleviate them, we have conducted empirical studies of selected applications representing different classes of concurrent programs. These analyses have identified load imbalance, the parallelism model adopted, communication delay and throughput, and system factors as the primary factors affecting performance in cluster environments. Based on the degree to which these factors affect specific classes of applications, we propose a combination of model selection criteria, partitioning strategies, and software system heuristics to reduce overheads and enhance performance in network based environments. We demonstrate that agenda parallelism and load balancing strategies contribu...
Exploiting Parallelism In Functional Languages: A "ParadigmOriented" Approach
 BOOK CHAPTER
, 1995
"... Deriving parallelism automatically from functional programs is simple in theory but very few practical implementations have been realised. Programs may contain too little or too much parallelism causing a degradation in performance. Such parallelism could be more efficiently controlled if parallel a ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
Deriving parallelism automatically from functional programs is simple in theory but very few practical implementations have been realised. Programs may contain too little or too much parallelism causing a degradation in performance. Such parallelism could be more efficiently controlled if parallel algorithmic structures (or skeletons) are used in the design of algorithms. A structure captures the behaviour of a parallel programming paradigm and acts as a template in the design of an algorithm. This paper presents some important parallel programming paradigms and defines a structure for each of these paradigms. The iterative transformation paradigm (or geometric parallelism) is discussed in detail and a framework under which programs can be developed and transformed into efficient and portable implementations is presented.
In recent years, there has been a st...
Parallel Processing of Discrete Optimization Problems
 IN ENCYCLOPEDIA OF MICROCOMPUTERS
, 1993
"... Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goa ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods such as branchandbound and dynamic programming. Availability of parallel computers has created substantial interest in exploring the use of parallel processing for solving discrete optimization problems. This article provides an overview of parallel search algorithms for solving discrete optimization problems.
A Unified Infrastructure for Parallel OutOfCore Isosurface Extraction and Volume Rendering of Unstructured Grids
 Proc. IEEE Symposium on Parallel and LargeData Visualization and Graphics
, 2001
"... In this paper, we present a unified infrastructure for parallel outofcore isosurface extraction and volume rendering of large unstructured grids on distributedmemory parallel machines. We parallelize the outofcore isosurface extraction algorithm of [9] and the outofcore ZSweep technique [17] ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
In this paper, we present a unified infrastructure for parallel outofcore isosurface extraction and volume rendering of large unstructured grids on distributedmemory parallel machines. We parallelize the outofcore isosurface extraction algorithm of [9] and the outofcore ZSweep technique [17] for direct volume rendering, using the metacell technique as a unified underlying building block.
Scalability of Parallel Sorting on Mesh Multicomputers
, 1991
"... This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The is ..."
Abstract

Cited by 18 (12 self)
 Add to MetaCart
This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The isoefficiency of QSP1 is also fairly close to optimal. Lang et al. and Schnorr et al. have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) runtime complexity for the oneelementperprocessor case. Both QSP1 and QSP2 have worse performance than these algorithms for the oneelementperprocessor case. But QSP1 and QSP2 have better scalability than the scaleddown variants of these algorithms (for the case in which there are more elements than processors). As a result, our new parallel formulations are better than these scaleddown variants in terms of speedup w.r.t the best sequential algorithms. We also present a dif...
Advanced Compiler Optimizations for Sparse Computations
 Journal of Parallel and Distributed Computing
, 1995
"... Regular data dependence checking on sparse codes usually results in very conservative estimates of actual dependences that will occur at runtime. Clearly, this is caused by the usage of compact data structures that are necessary to exploit sparsity in order to reduce storage requirements and comput ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Regular data dependence checking on sparse codes usually results in very conservative estimates of actual dependences that will occur at runtime. Clearly, this is caused by the usage of compact data structures that are necessary to exploit sparsity in order to reduce storage requirements and computational time. However, if the compiler is presented with dense code and automatically converts it into code that operates on sparse data structures, then the dependence information obtained by analysis on the original code can be used to exploit potential concurrency in the generated code. In this paper we present synchronization generating and manipulating techniques that are based on this concept. 1 Introduction Nowadays compiler support usually fails to optimize sparse codes because compact storage formats are used for sparse matrices in order to exploit sparsity with respect to storage requirements and computational time. This exploitation results in complicated code in which, for exam...
Towards the Classification of Algorithmic Skeletons
, 1996
"... Algorithmic skeletons are seen as being highlevel, parallel programming language constructs encapsulating the expression of parallelism, communication, synchronisation, embedding, and costing. This report examines the classification of algorithmic skeletons, proposing one classification, and examin ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Algorithmic skeletons are seen as being highlevel, parallel programming language constructs encapsulating the expression of parallelism, communication, synchronisation, embedding, and costing. This report examines the classification of algorithmic skeletons, proposing one classification, and examining others which have been devised. Various algorithmic skeletons are examined, and these are categorised to form a core of algorithmic skeletons suitable for a general classification which is based on practical experience in the use of such skeletons. This categorisation is compared with others which have been proposed. Similarly, other skeletonlike approaches are briefly examined. 1 Introduction 1.1 Algorithmic Skeletons Algorithmic skeletons are envisaged as highlevel, parallel programming language constructs encapsulating the expression of parallelism, communication, synchronisation and embedding, and having an associated cost complexity. Skeletons are to parallel threads as sequenti...
Logarithmic time cost optimal parallel sorting is not yet fast in practice
 August), Dept. of Computer Science, Brown University
, 1990
"... When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly describ ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly described for the PRAM (Parallel Random Access Machine) model of computation [13, 26]. This paper shows how this kind of investigation can be done on a simple but versatile environment forprogramming and measuring of PRAM algorithms [18, 19]. The practical value of Cole's Parallel Merge Sort algorithm [10,11] have beeninvestigated by comparing it with Batcher's bitonic sorting [5]. The O(log n) time consumption of Cole's algorithm implies that it must be faster than bitonic sorting which is O(log 2 n) timeif n is large enough. However, we havefound that bitonic sorting is faster as long as n is less than 1:2 1021, i.e. more than 1 Giga Tera items!. Consequently, Cole's logarithmic time algorithm is not fast in practice. 1Introduction and Motivation The work reported in this paper is an attempt to lessen the gap between theory and practice within the eld of parallel computing. Within theoretical computer science, parallel algorithms are mainly compared by using asymptotical analysis (Onotation). This paper gives an example on how the analysis of implemented algorithms on nite problems provides new and more practically oriented results than those traditionally obtained by asymptotical analysis. Parallel Complexity TheoryA Rich Source for