Results 1  10
of
352
Introduction to Algorithms
, 2009
"... We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk workstealing scheduler on a machine ..."
Abstract

Cited by 9061 (56 self)
 Add to MetaCart
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk workstealing scheduler on a machine with dagconsistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n3/ Z + (Pn)1/3n2) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations, which incurs O(n2/Z+n+ Pn3+) cache misses with high probability.
Scheduling Algorithms for Multiprogramming in a HardRealTime Environment
, 1973
"... The problem of multiprogram scheduling on a single processor is studied from the viewpoint... ..."
Abstract

Cited by 3236 (2 self)
 Add to MetaCart
The problem of multiprogram scheduling on a single processor is studied from the viewpoint...
Proof verification and hardness of approximation problems
 IN PROC. 33RD ANN. IEEE SYMP. ON FOUND. OF COMP. SCI
, 1992
"... We show that every language in NP has a probablistic verifier that checks membership proofs for it using logarithmic number of random bits and by examining a constant number of bits in the proof. If a string is in the language, then there exists a proof such that the verifier accepts with probabilit ..."
Abstract

Cited by 723 (45 self)
 Add to MetaCart
We show that every language in NP has a probablistic verifier that checks membership proofs for it using logarithmic number of random bits and by examining a constant number of bits in the proof. If a string is in the language, then there exists a proof such that the verifier accepts with probability 1 (i.e., for every choice of its random string). For strings not in the language, the verifier rejects every provided “proof " with probability at least 1/2. Our result builds upon and improves a recent result of Arora and Safra [6] whose verifiers examine a nonconstant number of bits in the proof (though this number is a very slowly growing function of the input length). As a consequence we prove that no MAX SNPhard problem has a polynomial time approximation scheme, unless NP=P. The class MAX SNP was defined by Papadimitriou and Yannakakis [82] and hard problems for this class include vertex cover, maximum satisfiability, maximum cut, metric TSP, Steiner trees and shortest superstring. We also improve upon the clique hardness results of Feige, Goldwasser, Lovász, Safra and Szegedy [42], and Arora and Safra [6] and shows that there exists a positive ɛ such that approximating the maximum clique size in an Nvertex graph to within a factor of N ɛ is NPhard.
Cilk: An Efficient Multithreaded Runtime System
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "cri ..."
Abstract

Cited by 582 (38 self)
 Add to MetaCart
(Show Context)
Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and criticalpath length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (wellstructured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is "work stealing," in which processors needing work ste ..."
Abstract

Cited by 429 (39 self)
 Add to MetaCart
(Show Context)
This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good workstealing scheduler for multithreaded computations with dependencies. Specifically,
Polynomial time approximation schemes for Euclidean TSP and other geometric problems
 In Proceedings of the 37th IEEE Symposium on Foundations of Computer Science (FOCS’96
, 1996
"... Abstract. We present a polynomial time approximation scheme for Euclidean TSP in fixed dimensions. For every fixed c � 1 and given any n nodes in � 2, a randomized version of the scheme finds a (1 � 1/c)approximation to the optimum traveling salesman tour in O(n(log n) O(c) ) time. When the nodes a ..."
Abstract

Cited by 336 (3 self)
 Add to MetaCart
Abstract. We present a polynomial time approximation scheme for Euclidean TSP in fixed dimensions. For every fixed c � 1 and given any n nodes in � 2, a randomized version of the scheme finds a (1 � 1/c)approximation to the optimum traveling salesman tour in O(n(log n) O(c) ) time. When the nodes are in � d, the running time increases to O(n(log n) (O(�dc))d�1). For every fixed c, d the running time is n � poly(log n), that is nearly linear in n. The algorithm can be derandomized, but this increases the running time by a factor O(n d). The previous best approximation algorithm for the problem (due to Christofides) achieves a 3/2approximation in polynomial time. We also give similar approximation schemes for some other NPhard Euclidean problems: Minimum Steiner Tree, kTSP, and kMST. (The running times of the algorithm for kTSP and kMST involve an additional multiplicative factor k.) The previous best approximation algorithms for all these problems achieved a constantfactor approximation. We also give efficient approximation schemes for Euclidean MinCost Matching, a problem that can be solved exactly in polynomial time. All our algorithms also work, with almost no modification, when distance is measured using any geometric norm (such as �p for p � 1 or other Minkowski norms). They also have simple parallel (i.e., NC) implementations.
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract

Cited by 236 (4 self)
 Add to MetaCart
Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 03600300/99/12000406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
Scheduling to Minimize Average Completion Time: Offline and Online Algorithms
, 1996
"... Timeindexed linear programming formulations have recently received a great deal of attention for their practical effectiveness in solving a number of singlemachine scheduling problems. We show that these formulations are also an important tool in the design of approximation algorithms with good wo ..."
Abstract

Cited by 204 (27 self)
 Add to MetaCart
Timeindexed linear programming formulations have recently received a great deal of attention for their practical effectiveness in solving a number of singlemachine scheduling problems. We show that these formulations are also an important tool in the design of approximation algorithms with good worstcase performance guarantees. We give simple new rounding techniques to convert an optimal fractional solution into a feasible schedule for which we can prove a constantfactor performance guarantee, thereby giving the first theoretical evidence of the strength of these relaxations. Specifically, we consider the problem of minimizing the total weighted job completion time on a single machine subject to precedence constraints, and give a polynomialtime (4 + ffl)approximation algorithm, for any ffl ? 0; the best previously known guarantee for this problem was superlogarithmic. With somewhat larger constants, we also show how to extend this result to the case with release date constraints, ...
New Algorithms for an Ancient Scheduling Problem
, 1992
"... We consider the online version of the original mmachine scheduling problem: given m machines and n positive real jobs, schedule the n jobs on the m machines so as to minimize the makespan, the completion time of the last job. In the online version, as soon as job j arrives, it must be assigned im ..."
Abstract

Cited by 92 (4 self)
 Add to MetaCart
We consider the online version of the original mmachine scheduling problem: given m machines and n positive real jobs, schedule the n jobs on the m machines so as to minimize the makespan, the completion time of the last job. In the online version, as soon as job j arrives, it must be assigned immediately to one of the m machines. We present two main results. The first is a (2  ffl)competitive deterministic algorithm for all m. The competitive ratio of all previous algorithms approaches 2 as m !1. Indeed, the problem of improving the competitive ratio for large m had been open since 1966, when the first algorithm for this problem appeared. The second result is an optimal randomized algorithm for the case m = 2. To the best of our knowledge, our 4/3competitive algorithm is the first specifically randomized algorithm for the original, mmachine, online scheduling problem.