Results 1 - 10
of
259
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computa ..."
Abstract
-
Cited by 316 (32 self)
- Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically,
The Merge/Purge Problem for Large Databases
- In Proceedings of the 1995 ACM SIGMOD
, 1995
"... Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsiste ..."
Abstract
-
Cited by 254 (3 self)
- Add to MetaCart
Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsistent and often incorrect fashion. The problem we study here is the task of merging data from multiple sources in as efficient manner as possible, while maximizing the accuracy of the result. We call this the merge/purge problem. In this paper we detail the sorted neighborhood method that is used by some to solve merge/purge and present experimental results that demonstrates this approach may work well in practice but at great expense. An alternative method based upon clustering is also presented with a comparative evaluation to the sorted neighborhood method. We show a means of improving the accuracy of the results based upon a multi-pass approach that succeeds by computing the Transitive Clos...
The Implementation of the Cilk-5 Multithreaded Language
- In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation
, 1998
"... The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear st ..."
Abstract
-
Cited by 248 (20 self)
- Add to MetaCart
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design...
APPROXIMATION ALGORITHMS FOR SCHEDULING UNRELATED PARALLEL MACHINES
, 1990
"... We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomi ..."
Abstract
-
Cited by 178 (6 self)
- Add to MetaCart
We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomial algorithm which constructs a schedule that is guaranteed to be no longer than twice the optimum. We also present a polynomial approximation scheme for the case that the number of machines is fixed. Both approximation results are corollaries of a theorem about the relationship of a class of integer programming problems and their linear programming relaxations. In particular, we give a polynomial method to round the fractional extreme points of the linear program to integral points that nearly satisfy the constraints. In contrast to our main result, we prove that no polynomial algorithm can achieve a worst-case ratio less than ~ unless P = NP. We finally obtain a complexity classification for all special cases with a fixed number of processing times.
Scheduling algorithms and operating systems support for real-time systems
- Proceedings of the IEEE
, 1994
"... This paper summarizes the state of the real-time field in the areas of scheduling and operating system kernels. Given the vast amount of work that has been done by both the operations research and computer science communities in the scheduling area, we discuss four paradigms underlying the schedulin ..."
Abstract
-
Cited by 101 (1 self)
- Add to MetaCart
This paper summarizes the state of the real-time field in the areas of scheduling and operating system kernels. Given the vast amount of work that has been done by both the operations research and computer science communities in the scheduling area, we discuss four paradigms underlying the scheduling approaches and present several exemplars of each. The four paradigms are: static table-driven scheduling, static priority preemptive scheduling, dynamic planning-based scheduling, and dynamic best efSort scheduling. In the operating system context, we argue that most of the proprietary commercial kernels as well as real-time extensions to time-sharing operating system kernels do not fit the needs of predictable real-time systems. We discuss several research kernels that are currently being built to explicitly meet the needs of real-time applications. I.
On-line Load Balancing
- Theoretical Computer Science
, 1992
"... . We survey on-line load balancing on various models. 1 Introduction General: The machine load balancing problem is defined as follows: There are n parallel machines and a number of independent tasks (jobs); the tasks arrive at arbitrary times, where each task has an associated load vector and dur ..."
Abstract
-
Cited by 96 (16 self)
- Add to MetaCart
. We survey on-line load balancing on various models. 1 Introduction General: The machine load balancing problem is defined as follows: There are n parallel machines and a number of independent tasks (jobs); the tasks arrive at arbitrary times, where each task has an associated load vector and duration. A task has to be assigned immediately to exactly one of the machines, thereby increasing the load on this machine by the amount specified by the corresponding coordinate of the load vector for the duration of the task. All tasks must be assigned, i.e., no admission control is allowed. The goal is usually to minimize the maximumload, but we also consider other goal functions. We mainly consider non-preemptive load balancing, but in some cases we may allow preemption i.e., reassignments of tasks. All the decisions are made by a centralized controller. The online load balancing problem naturally arises in many applications involving allocation of resources. As a simple concrete example,...
On The Granularity And Clustering Of Directed Acyclic Task Graphs
- IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time pre-processing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, t ..."
Abstract
-
Cited by 92 (20 self)
- Add to MetaCart
Clustering has been used as a compile time pre-processing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
Practical Skew Handling in Parallel Joins
- IN PROCEEDINGS OF THE 18TH VLDB CONFERENCE
, 1992
"... We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which wecall virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skew-handling numbers from an actual implementation.
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.

