Results 1  10
of
312
Cilk: An Efficient Multithreaded Runtime System
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a C ..."
Abstract

Cited by 534 (39 self)
 Add to MetaCart
Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and criticalpath length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (wellstructured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is "work stealing," in which processors needing work steal computa ..."
Abstract

Cited by 398 (38 self)
 Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good workstealing scheduler for multithreaded computations with dependencies. Specifically,
The Implementation of the Cilk5 Multithreaded Language
 In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation
, 1998
"... The fifth release of the multithreaded language Cilk uses a provably good "workstealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear st ..."
Abstract

Cited by 320 (25 self)
 Add to MetaCart
The fifth release of the multithreaded language Cilk uses a provably good "workstealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "workfirst" principle has led to a portable Cilk5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the workfirst principle was exploited in the design...
The Merge/Purge Problem for Large Databases
 In Proceedings of the 1995 ACM SIGMOD
, 1995
"... Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsiste ..."
Abstract

Cited by 300 (3 self)
 Add to MetaCart
Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsistent and often incorrect fashion. The problem we study here is the task of merging data from multiple sources in as efficient manner as possible, while maximizing the accuracy of the result. We call this the merge/purge problem. In this paper we detail the sorted neighborhood method that is used by some to solve merge/purge and present experimental results that demonstrates this approach may work well in practice but at great expense. An alternative method based upon clustering is also presented with a comparative evaluation to the sorted neighborhood method. We show a means of improving the accuracy of the results based upon a multipass approach that succeeds by computing the Transitive Clos...
APPROXIMATION ALGORITHMS FOR SCHEDULING UNRELATED PARALLEL MACHINES
, 1990
"... We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomi ..."
Abstract

Cited by 206 (6 self)
 Add to MetaCart
We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomial algorithm which constructs a schedule that is guaranteed to be no longer than twice the optimum. We also present a polynomial approximation scheme for the case that the number of machines is fixed. Both approximation results are corollaries of a theorem about the relationship of a class of integer programming problems and their linear programming relaxations. In particular, we give a polynomial method to round the fractional extreme points of the linear program to integral points that nearly satisfy the constraints. In contrast to our main result, we prove that no polynomial algorithm can achieve a worstcase ratio less than ~ unless P = NP. We finally obtain a complexity classification for all special cases with a fixed number of processing times.
The Structure and Complexity of Nash Equilibria for a Selfish Routing Game
, 2002
"... In this work, we study the combinatorial structure and the computational complexity of Nash equilibria for a certain game that models sel sh routing over a network consisting of m parallel links. We assume a collection of n users, each employing a mixed strategy, which is a probability distribu ..."
Abstract

Cited by 101 (22 self)
 Add to MetaCart
In this work, we study the combinatorial structure and the computational complexity of Nash equilibria for a certain game that models sel sh routing over a network consisting of m parallel links. We assume a collection of n users, each employing a mixed strategy, which is a probability distribution over links, to control the routing of its own assigned trac. In a Nash equilibrium, each user sel shly routes its trac on those links that minimize its expected latency cost, given the network congestion caused by the other users. The social cost of a Nash equilibrium is the expectation, over all random choices of the users, of the maximum, over all links, latency through a link.
Online Load Balancing
 Theoretical Computer Science
, 1992
"... . We survey online load balancing on various models. 1 Introduction General: The machine load balancing problem is defined as follows: There are n parallel machines and a number of independent tasks (jobs); the tasks arrive at arbitrary times, where each task has an associated load vector and dur ..."
Abstract

Cited by 100 (15 self)
 Add to MetaCart
. We survey online load balancing on various models. 1 Introduction General: The machine load balancing problem is defined as follows: There are n parallel machines and a number of independent tasks (jobs); the tasks arrive at arbitrary times, where each task has an associated load vector and duration. A task has to be assigned immediately to exactly one of the machines, thereby increasing the load on this machine by the amount specified by the corresponding coordinate of the load vector for the duration of the task. All tasks must be assigned, i.e., no admission control is allowed. The goal is usually to minimize the maximumload, but we also consider other goal functions. We mainly consider nonpreemptive load balancing, but in some cases we may allow preemption i.e., reassignments of tasks. All the decisions are made by a centralized controller. The online load balancing problem naturally arises in many applications involving allocation of resources. As a simple concrete example,...
Practical Skew Handling in Parallel Joins
 IN PROCEEDINGS OF THE 18TH VLDB CONFERENCE
, 1992
"... We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on nonskewed data. The main idea is to use multiple algorithms, each ..."
Abstract

Cited by 99 (8 self)
 Add to MetaCart
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on nonskewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skewhandling parallel join algorithms; one, which wecall virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skewhandling numbers from an actual implementation.
On The Granularity And Clustering Of Directed Acyclic Task Graphs
 IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, t ..."
Abstract

Cited by 98 (20 self)
 Add to MetaCart
Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 82 (25 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any