Results 1  10
of
228
Cilk: An Efficient Multithreaded Runtime System
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a C ..."
Abstract

Cited by 534 (39 self)
 Add to MetaCart
Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and criticalpath length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (wellstructured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Designing Programs That Check Their Work
, 1989
"... A program correctness checker is an algorithm for checking the output of a computation. That is, given a program and an instance on which the program is run, the checker certifies whether the output of the program on that instance is correct. This paper defines the concept of a program checker. It d ..."
Abstract

Cited by 307 (17 self)
 Add to MetaCart
A program correctness checker is an algorithm for checking the output of a computation. That is, given a program and an instance on which the program is run, the checker certifies whether the output of the program on that instance is correct. This paper defines the concept of a program checker. It designs program checkers for a few specific and carefully chosen problems in the class FP of functions computable in polynomial time. Problems in FP for which checkers are presented in this paper include Sorting, Matrix Rank and GCD. It also applies methods of modern cryptography, especially the idea of a probabilistic interactive proof, to the design of program checkers for group theoretic computations. Two strucural theorems are proven here. One is a characterization of problems that can be checked. The other theorem establishes equivalence classes of problems such that whenever one problem in a class is checkable, all problems in the class are checkable.
LogGP: Incorporating Long Messages into the LogP Model  One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract

Cited by 236 (1 self)
 Add to MetaCart
We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixedsized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP2, Paragon, Meiko CS2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Models and Languages for Parallel Computation
 ACM COMPUTING SURVEYS
, 1998
"... We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in ..."
Abstract

Cited by 134 (4 self)
 Add to MetaCart
We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in 6 categories, depending on the level of abstraction they provide.
A Randomized LinearTime Algorithm to Find Minimum Spanning Trees
, 1994
"... We present a randomized lineartime algorithm to find a minimum spanning tree in a connected graph with edge weights. The algorithm uses random sampling in combination with a recently discovered lineartime algorithm for verifying a minimum spanning tree. Our computational model is a unitcost ra ..."
Abstract

Cited by 115 (7 self)
 Add to MetaCart
We present a randomized lineartime algorithm to find a minimum spanning tree in a connected graph with edge weights. The algorithm uses random sampling in combination with a recently discovered lineartime algorithm for verifying a minimum spanning tree. Our computational model is a unitcost randomaccess machine with the restriction that the only operations allowed on edge weights are binary comparisons.
The Power of Reconfiguration
, 1998
"... This paper concerns the computational aspects of the reconfigurable network model. The computational power of the model is investigated under several network topologies and assuming several variants of the model. In particular, it is shown that there are reconfigurable machines based on simple netwo ..."
Abstract

Cited by 84 (7 self)
 Add to MetaCart
This paper concerns the computational aspects of the reconfigurable network model. The computational power of the model is investigated under several network topologies and assuming several variants of the model. In particular, it is shown that there are reconfigurable machines based on simple network topologies, that are capable of solving large classes of problems in constant time. These classes depend on the kinds of switches assumed for the network nodes. Reconfigurable networks are also compared with various other models of parallel computation, like PRAM's and Branching Programs. Part of this work is to be presented at the 18th International Colloquium on Automata, Languages, and Programming (ICALP), July 1991, Madrid. y Department of Computer Science, The Hebrew University, Jerusalem 91904, Israel. Email: yosi@humus.huji.ac.il, Supported by Eshcol Fellowship. z Department of Applied Mathematics and Computer Science, The Weizmann Institute, Rehovot 76100, Israel. Email: p...
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compiletime heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamiclevel scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graphanalysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
Abstract state machines capture parallel algorithms
 ACM Transactions on Computational Logic
, 2003
"... Abstract We give an axiomatic description of parallel, synchronous algorithms. Our main result is that every such algorithm can be simulated, step for step, by an abstract state machine with a background that provides for multisets. \Lambda ..."
Abstract

Cited by 58 (23 self)
 Add to MetaCart
Abstract We give an axiomatic description of parallel, synchronous algorithms. Our main result is that every such algorithm can be simulated, step for step, by an abstract state machine with a background that provides for multisets. \Lambda