Results 1 - 10
of
31
Collection-Oriented Languages
- PROCEEDINGS OF THE IEEE
, 1991
"... Several programming languages arising from widely diverse practical and theoretical considerations share a common high-level feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and th ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
Several programming languages arising from widely diverse practical and theoretical considerations share a common high-level feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and the collections they support) are FORTRAN 90 (arrays), APL (arrays), Connection Machine LISP (xectors), PARALATION LISP (paralations), and SETL (sets). Acting on large collections of data with a single operation is the hallmark of data-parallel programming and massively parallel computers. These languages --- which we call collection-oriented --- are thus ideal for use with massively parallel machines, even though many of them were developed before parallelism and associated considerations became important. This paper examines collections and the operations that can be performed on them in a language-independent manner. It also critically reviews and compares a variety of collection-oriented languages...
Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines
- Algorithmica
, 1988
"... This paper introduces a model for parallel computation, called the distributed random-access machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This paper introduces a model for parallel computation, called the distributed random-access machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)-step, linear-processor, linear-space, conservative algorithms for a variety of problems on n- node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...
Computational Structure of the N-body Problem
, 1989
"... the number of paxtides is on the order of a million. The main concern of this work is the organization ad performance of these computations on prallel computers. ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
the number of paxtides is on the order of a million. The main concern of this work is the organization ad performance of these computations on prallel computers.
Scan Primitives for Vector Computers
- In Proceedings Supercomputing '90
, 1990
"... This paper describes an optimized implementation of a set of scan (also called allprefix -sums) primitives on a single processor of a CRAY Y-MP, and demonstrates that their use leads to greatly improved performance for several applications that cannot be vectorized with existing compiler technology. ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
This paper describes an optimized implementation of a set of scan (also called allprefix -sums) primitives on a single processor of a CRAY Y-MP, and demonstrates that their use leads to greatly improved performance for several applications that cannot be vectorized with existing compiler technology. The algorithm used to implement the scans is based on an algorithm for parallel computers and is applicable with minor modifications to any register-based vector computer. On the CRAY Y-MP, the asymptotic running time of the plus-scan is about 2.25 times that of a vector add, and is within 20% of optimal. An important aspect of our implementation is that a set of segmented versions of these scans are only marginally more expensive than the unsegmented versions. These segmented versions can be used to execute a scan on multiple data sets without having to pay the vector startup cost (n 1=2 ) for each set. The paper describes a radix sorting routine based on the scans that is 13 times faster ...
Massively Parallel Genetic Programming
, 1996
"... Introduction The idea of simulating a MIMD machine using a SIMD architecture is not new ([8, 15]). One of the original ideas for the Connection Machine ([8]) was that it could simulate other parallel architectures. Indeed, in the extreme, each processor on a SIMD architecture can simulate a univers ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Introduction The idea of simulating a MIMD machine using a SIMD architecture is not new ([8, 15]). One of the original ideas for the Connection Machine ([8]) was that it could simulate other parallel architectures. Indeed, in the extreme, each processor on a SIMD architecture can simulate a universal Turing machine (TM). With different turing machine specifications stored in each local memory, each processor would simply have its own tape, tape head, state table and state pointer, and the simulation would be performed by repeating the basic TM operations simultaneously. Of course, such a simulation would be very inefficient, and difficult to program, but would have the advantage of being really MIMD, where no SIMD processor would be in idle state, until its simulated machine halts. Now let us consider an alternative idea, that each SIMD processor would simulate an individual stored program computer using a simple instruction set. For each step of the simulation, the SIMD syste
Mixed Programming Metaphors in a Shared Dataspace Model of Concurrency
- IEEE Transactions on Software Engineering
, 2003
"... The term shared dataspace refers to the general class of models and languages in which the principal means of communication is a common, content-addressable data structure called a dataspace. Swarm is a simple language we have used as a vehicle for the investigation of the shared dataspace approa ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
The term shared dataspace refers to the general class of models and languages in which the principal means of communication is a common, content-addressable data structure called a dataspace. Swarm is a simple language we have used as a vehicle for the investigation of the shared dataspace approach to concurrent computation. It is the first shared dataspace language to have an associated assertional-style proof system. An important feature of Swarm is its ability to bring a variety of programming paradigms under a single, unified model. In a series of related examples we explore Swarm's capacity to express shared-variable, messagepassing, and rule-based computations; to specify synchronous and asynchronous processing modes; and to accommodate highly dynamic program and data structures. Several illustrations make use of a programming construct unique to Swarm, the synchrony relation, and explain how this feature can be used to construct dynamically structured, partially synchronous computations. The paper has three parts: an overview the Swarm programming notation, an examination of Swarm programming strategies via a series of related example programs, and a discussion of the distinctive features of the shared dataspace model. A formal operational model for Swarm is presented in an appendix.
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and fine-tuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and fine-tuning techniques that we developed for the problem of finding connected components in parallel; many of the fine-tuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Data-Parallel Load Balancing Strategies
- Parallel Computing
, 1996
"... Programming irregular and dynamic data-parallel algorithms requires to take data distribution into account. The implementation of a load balancing algorithm is a quite difficult task for the programmer. However, a load balancing strategy may be developed independently of the application. The integra ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Programming irregular and dynamic data-parallel algorithms requires to take data distribution into account. The implementation of a load balancing algorithm is a quite difficult task for the programmer. However, a load balancing strategy may be developed independently of the application. The integration of such a strategy in the data-parallel algorithm may be relevant to a library or a data-parallel compiler run-time. We propose load distribution data-parallel algorithms for a class of irregular data-parallel algorithms called stack algorithms. Our algorithms allow the use of regular and/or irregular communication patterns to exchange the works between processors. The results of theoretical analysis of these algorithms are presented. They allow a comparison of the different load balancing algorithms and the identification of criterion for the choice of a load balancing algorithm.

