Results 1 - 10
of
10
Explicit Multi-Threading (XMT) Bridging Models for Instruction Parallelism
- Proc. 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1998
"... The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multi-threaded instruction-level parallelism (ILP); that is, Explicit Multi-Threading (XMT), a fine-grained computational paradigm covering the spectrum from algorithms throu ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multi-threaded instruction-level parallelism (ILP); that is, Explicit Multi-Threading (XMT), a fine-grained computational paradigm covering the spectrum from algorithms through architecture to implementation is introduced; new elements are added where needed. The more detailed presentation is by way of a bridging model. Among other things, a bridging model provides a design space for algorithm designers and programmers, as well as a design space for computer architects. It is convenient to describe our wider vision regarding "parallel-computing-on-a-chip" as a two-stage development and therefore two bridging models are presented: Spawn-based multi-threading (Spawn-MT) and Elastic multi-threading (EMT). The case for Spawn-MT (or, alternatively, EMT) as a bridging model relies on the following evidence. (1) Spawn-MT comprises an "instruction set level", wh...
Can Parallel Algorithms Enhance Serial Implementation? (Extended Abstract)
, 1996
"... The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of th ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of this thesis. However, using some imagination, validity of the thesis and some arguments supporting it may lead to several far-reaching outcomes: (1) Reliance on "predictability of reference" in the design of computer systems will increase. (2) Parallel algorithms will be taught as part of the standard computer science and engineering undergraduate curriculum irrespective of whether (or when) parallel processing will become ubiquitous in the generalpurpose computing world. (3) A strategic agenda for high-performance parallel computing: A multi-stage agenda, which in no stage compromises user-friendliness of the programmer 's...
Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk
, 1996
"... Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achi ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations---and other distributed memory architectures in general---based on the Bulk Synchronous Parallel (BSP) model. BSPk emphasizes efficiency in communication by minimizing local memory-to-memory copying, and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing (MP) and distributed shared memory (DSM) programming styles are supported in BSPk. MP helps processes efficiently exchange short-lived unnamed data values, when the identity of either the sender or receiver is known to the other party. By contrast, DSM supports communication between processes that maybemutually anonymous, so long as they can agree on variable names in which to store shared temporary or long-lived data.
Stages and Transformations in Parallel Programming
- Abstract Machine Models for Parallel and Distributed Computing
, 1996
"... . An approach, called SAT (Stages And Transformations), is introduced to support the derivation of parallel distributed-memory programs. During the design, a program is viewed as a single thread of stages, with parallelism concentrated within stages; the target program is of the SPMD format. The des ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
. An approach, called SAT (Stages And Transformations), is introduced to support the derivation of parallel distributed-memory programs. During the design, a program is viewed as a single thread of stages, with parallelism concentrated within stages; the target program is of the SPMD format. The design process is based on the transformation rules of the Bird-Meertens formalism of higher-order functions over lists. The approach is illustrated by three case studies which include: a systematic method of constructing list homomorphisms, a scalable, load-balanced implementation of divide-and-conquer based on a specialized topology and a formal derivation of a time and cost optimal parallel algorithm for straightforward polynomial multiplication. 1 Introduction The main problem with parallel and distributed systems today seems not to be how to build them, but how to make them work efficiently. The enormous diversity of architectures, together with the specific problems of parallelism, not e...
BSPk: Low Overhead Communication Constructs and Logical Barriers for Bulk Synchronous Parallel Programming (Extended Abstract)
, 1996
"... Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achie ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations---and other distributed memory architectures in general---based on the Bulk Synchronous Parallel (BSP) model. BSPk, whose design is the subject of this paper, emphasizes efficiency in communication by minimizing local memory-to-memory copying, and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing ...
Parallel Algorithms for Database Operations and a Database Operation for Parallel Algorithms
, 1995
"... This paper establishes some significant links between two areas: (i) relational parallel database systems; and (ii) the design and analysis of parallel algorithms. The paper begins with a fundamental but very simple observation: implementing a Join operation in the context of relational parallel da ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper establishes some significant links between two areas: (i) relational parallel database systems; and (ii) the design and analysis of parallel algorithms. The paper begins with a fundamental but very simple observation: implementing a Join operation in the context of relational parallel database systems is at least as expensive as implementing an arbitrary PRAM computation. Thus, the efficiency with which a given parallel computer can support a parallel relational database where Joins are fairly frequent is strongly related to the efficiency with which that computer can support the PRAM as one of its programmer 's models. The main technical contribution is an efficient parallel algorithm for the Join operation on a model where, in order to use the available bandwidth effectively, communication has to be performed in large blocks. 1 1 Introduction A key performance bottleneck for various database applications on serial computers has been high latency and low bandwidth while ...
Modeling Parallel Shared Memory Computations
, 1998
"... 190 pages ISSN 1238-6944, ISBN 951-708-693-8 Keywords: parallel computing, shared memory, modeling, F-PRAM nterprocessor communication is the most difficult part of parallel computation on current parallel computers. Programmers find it difficult to correctly and reliably distribute and maintai ..."
Abstract
- Add to MetaCart
190 pages ISSN 1238-6944, ISBN 951-708-693-8 Keywords: parallel computing, shared memory, modeling, F-PRAM nterprocessor communication is the most difficult part of parallel computation on current parallel computers. Programmers find it difficult to correctly and reliably distribute and maintain the data of a parallel program. Most efficiency problems are due to excessive or inefficient communication. Parallel computer manufacturers find it difficult and expensive to build interprocessor communication networks that would keep up with fast processors. In this thesis we shall present a new model of parallel computing, the F-PRAM model. The model characterizes parallel computers with a set of parameters, most of which model the limitations of the shared memory access of the processors, i.e., the communication. For the programmer, the new model offers a convenient abstraction of shared memory, but charges duely the machine-dependent costs of the use of the shared memory. For
This is a draft report that is under revision. Please send comments to
, 2006
"... The increasing prominence of the Internet, the Web, and large data-networks in general has profoundly affected social and commercial activity. It has also wrought one of the most profound shifts in Computer Science since its inception. Traditionally, Computer-Science research focused primarily on un ..."
Abstract
- Add to MetaCart
The increasing prominence of the Internet, the Web, and large data-networks in general has profoundly affected social and commercial activity. It has also wrought one of the most profound shifts in Computer Science since its inception. Traditionally, Computer-Science research focused primarily on understanding how best to design, build, analyze, and program computers. Research focus has now shifted to the question of how best to design, build, analyze, and operate networks. How can one ensure that a network created and used by many autonomous organizations and individuals functions properly, respects the rights of users, and exploits its vast shared resources fully and fairly? The Theory of Computation (ToC) community can help address the full spectrum of research questions implicit in this grand challenge by developing a Theory of Networked Computation (ToNC), encompassing both positive and negative results. ToC research has already evolved with and influenced the growth of the Web, producing interesting results and techniques in diverse problem domains, including search and information retrieval, network protocols, error correction, Internet-based auctions, and security. A more general Theory of Networked Computation could influence the development of new
Towards a Theory of Networked Computation
, 2009
"... The increasing prominence of the Internet, the Web, and large data networks in general has profoundly affected social and commercial activity. It has also wrought one of the most profound changes in Computer Science since its inception. Traditionally, Computer-Science research has focused primarily ..."
Abstract
- Add to MetaCart
The increasing prominence of the Internet, the Web, and large data networks in general has profoundly affected social and commercial activity. It has also wrought one of the most profound changes in Computer Science since its inception. Traditionally, Computer-Science research has focused primarily on understanding how best to design, build, analyze, and program computers. The research agenda has now expanded to include the question of how best to design, build, analyze, and operate networks. How can one ensure that a network created and used by many autonomous organizations and individuals functions properly, respects the rights of users, and exploits its vast shared resources fully and fairly? The Theory of Computation (ToC) community can help address the full spectrum of research questions implicit in this grand challenge by developing a Theory of Networked Computation (ToNC), encompassing both positive and negative results. ToC research has already evolved with and influenced the growth of the Web, producing interesting results and techniques in diverse problem domains, including search and information retrieval, network protocols, error correction, Internet-based auctions, and security.
Resource analyses for parallel and distributed coordination
- CONCURRENCY COMPUTAT.: PRACT. EXPER. (2011)
, 2011
"... Predicting the resources that are consumed by a program component is crucial for many parallel or distributed systems. In this context, the main resources of interest are execution time, space and communication/synchronisation costs. There has recently been significant progress in resource analysis ..."
Abstract
- Add to MetaCart
Predicting the resources that are consumed by a program component is crucial for many parallel or distributed systems. In this context, the main resources of interest are execution time, space and communication/synchronisation costs. There has recently been significant progress in resource analysis technology, notably in type-based analyses and abstract interpretation. At the same time, parallel and distributed computing are becoming increasingly important. This paper synthesises progress in both areas to survey the state-of-the-art in resource analysis for parallel and distributed computing. We articulate a general model of resource analysis and describe parallel/distributed resource analysis together with the relationship to sequential analysis. We use three parallel or distributed resource analyses as examples and provide a critical evaluation of the analyses. We investigate why the chosen analysis is effective for each application and identify general principles governing

