Results 1  10
of
15
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 82 (25 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
An Optimal Parallel Algorithm for Formula Evaluation
, 1992
"... A new approach to Buss’s NC¹ algorithm [Proc. 19thACM Symposium on Theory of Computing, Association for Computing Machinery, New York, 1987, pp. 123131] for evaluation of Boolean formulas is presented. This problem is shown to be complete for NC¹ over AC¬ reductions. This approach is then used to s ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
A new approach to Buss’s NC¹ algorithm [Proc. 19thACM Symposium on Theory of Computing, Association for Computing Machinery, New York, 1987, pp. 123131] for evaluation of Boolean formulas is presented. This problem is shown to be complete for NC¹ over AC¬ reductions. This approach is then used to solve the more general problem of evaluating arithmetic formulas by using arithmetic circuits.
A Compendium of Problems Complete for P
, 1991
"... This paper serves two purposes. Firstly, it is an elementary introduction to the theory of Pcompleteness  the branch of complexity theory that focuses on identifying the problems in the class P that are "hardest," in the sense that they appear to lack highly parallel solutions. That is, they ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
This paper serves two purposes. Firstly, it is an elementary introduction to the theory of Pcompleteness  the branch of complexity theory that focuses on identifying the problems in the class P that are "hardest," in the sense that they appear to lack highly parallel solutions. That is, they do not have parallel solutions using time polynomial in the logarithm of the problem size and a polynomial number of processors unless all problem in P have such solutions, or equivalently, unless P = NC . Secondly, this paper is a reference work of Pcomplete problems. We present a compilation of the known Pcomplete problems, including several unpublished or new Pcompleteness results, and many open problems. This is a preliminary version, mainly containing the problem list. The latest version of this document is available in electronic form by anonymous ftp from thorhild.cs.ualberta.ca (129.128.4.53) as either a compressed dvi file (TR9111.dvi.Z) or as a compressed postscript fi...
On Separators, Segregators and Time versus Space
"... We give the first extension of the result due to Paul, Pippenger, Szemeredi and Trotter [24] that deterministic linear time is distinct from nondeterministic linear time. We show that N T IM E(n ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We give the first extension of the result due to Paul, Pippenger, Szemeredi and Trotter [24] that deterministic linear time is distinct from nondeterministic linear time. We show that N T IM E(n
The Size and Depth of Layered Boolean Circuits
"... Abstract. We consider the relationship between size and depth for layered Boolean circuits, synchronous circuits and planar circuits as well as classes of circuits with small separators. In particular, we show that every layered Boolean circuit of size s can be simulated by a layered Boolean circuit ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. We consider the relationship between size and depth for layered Boolean circuits, synchronous circuits and planar circuits as well as classes of circuits with small separators. In particular, we show that every layered Boolean circuit of size s can be simulated by a layered Boolean circuit of depth O ( √ s log s). For planar circuits and synchronous circuits of size s, we obtain simulations of depth O ( √ s). The best known result so far was by Paterson and Valiant [16], and Dymond and Tompa [6], which holds for general Boolean circuits and states that D(f) = O(C(f) / log C(f)), where C(f) and D(f) are the minimum size and depth, respectively, of Boolean circuits computing f. The proof of our main result uses an adaptive strategy based on the twoperson pebble game introduced by Dymond and Tompa [6]. Improving any of our results by polylog factors would immediately improve the bounds for general circuits. Key words: Boolean circuits, circuit size, circuit depth, pebble games 1
Parallelism Always Helps
 SIAM J. Comput
, 1997
"... . It is shown that every unitcost randomaccess machine (RAM) that runs in time T can be simulated by a concurrentread exclusivewrite parallel randomaccess machine (CREW PRAM) in time O(T 1/2 log T ). The proof is constructive; thus it gives a mechanical way to translate any sequential algori ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. It is shown that every unitcost randomaccess machine (RAM) that runs in time T can be simulated by a concurrentread exclusivewrite parallel randomaccess machine (CREW PRAM) in time O(T 1/2 log T ). The proof is constructive; thus it gives a mechanical way to translate any sequential algorithm designed to run on a unitcost RAM into a parallel algorithm that runs on a CREW PRAM and obtain a nearly quadratic speedup. One implication is that there does not exist any recursive function that is "inherently not parallelizable." Key words. computational complexity, time complexity, randomaccess machine, parallel randomaccess machine, simulation, speedup AMS subject classifications. 68Q05, 68Q10, 68Q15, 03D10, 03D15 PII. S0097539794265402 1. Introduction. 1.1. Motivation. For some problems, the direct parallelization of a sequential algorithm gives a faster parallel algorithm. An example is matrix multiplication. The bruteforce sequential algorithm for matrix multiplication runs ...
Automated proofs of time lower bounds
, 2007
"... A fertile area of recent research has demonstrated concrete polynomial time lower bounds for solving natural hard problems on restricted computational models. Among these problems are Satisfiability, Vertex Cover, Hamilton Path, MOD6SAT, MajorityofMajoritySAT, and Tautologies, to name a few. The ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A fertile area of recent research has demonstrated concrete polynomial time lower bounds for solving natural hard problems on restricted computational models. Among these problems are Satisfiability, Vertex Cover, Hamilton Path, MOD6SAT, MajorityofMajoritySAT, and Tautologies, to name a few. These lower bound proofs all follow a certain diagonalizationbased proofbycontradiction strategy. A pressing open problem has been to determine how powerful such proofs can possibly be. We propose an automated theoremproving methodology for studying these lower bound problems. In particular, we prove that the search for better lower bounds can often be turned into a problem of solving a large series of linear programming instances. We describe an implementation of a smallscale theorem prover and discover surprising experimental results. In some settings, our program provides strong evidence that the best known lower bound proofs are already optimal for the current framework, contradicting the consensus intuition; in others, the program guides us to improved lower bounds where none had been known for years.
NonLinear Time Lower Bound for (Succinct) Quantified Boolean Formulas
"... Abstract. We give a reduction from arbitrary languages in alternating time t(n) to quantified Boolean formulas (QBF) describable in O(t(n)) bits. The reduction works for a reasonable succinct encoding of Boolean formulas and for several reasonable machine models, including multitape Turing machines ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. We give a reduction from arbitrary languages in alternating time t(n) to quantified Boolean formulas (QBF) describable in O(t(n)) bits. The reduction works for a reasonable succinct encoding of Boolean formulas and for several reasonable machine models, including multitape Turing machines and logarithmiccost RAMs. By a simple diagonalization, it follows that our succinct QBF problem requires superlinear time on those models. To our knowledge this is the first known instance of a nonlinear time lower bound (with no space restriction) for solving a natural linear space problem on a variety of computational models.
A Generalization of Spira’s Theorem and Circuits with Small Segregators or Separators
"... Abstract. Spira [28] showed that any Boolean formula of size s can be simulated in depth O(log s). We generalize Spira’s theorem and show that any Boolean circuit of size s with segregators of size f(s) can be simulated in depth O(f(s) log s). If the segregator size is at least s ε for some constant ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Spira [28] showed that any Boolean formula of size s can be simulated in depth O(log s). We generalize Spira’s theorem and show that any Boolean circuit of size s with segregators of size f(s) can be simulated in depth O(f(s) log s). If the segregator size is at least s ε for some constant ε> 0, then we can obtain a simulation of depth O(f(s)). This improves and generalizes a simulation of polynomialsize Boolean circuits of constant treewidth k in depth O(k 2 log n) by Jansen and Sarma [17]. Since the existence of small balanced separators in a directed acyclic graph implies that the graph also has small segregators, our results also apply to circuits with small separators. Our results imply that the class of languages computed by nonuniform families of polynomialsize circuits that have constant size segregators equals nonuniform NC 1. Considering space bounded Turing machines to generate the circuits, for f(s) log 2 sspace uniform families of Boolean circuits our smalldepth simulations are also f(s) log 2 sspace uniform. As a corollary, we show that the Boolean Circuit Value problem for circuits with constant size segregators (or separators) is in deterministic SP ACE(log 2 n). Our results also imply that the Planar Circuit Value problem, which is known to be PComplete [16], can be solved in deterministic SP ACE ( √ n log n). Key words: Boolean circuits, circuit size, circuit depth, Spira’s theorem, Turing machines, space complexity 1
Parallelizing Time With Polynomial Circuits
, 2005
"... We study the relatively old problem of asymptotically reducing the runtime of serial computations with polynomial size Boolean circuits. To the best of our knowledge, no progress on this problem has been formally reported in the literature for general computational models, although we observe that e ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the relatively old problem of asymptotically reducing the runtime of serial computations with polynomial size Boolean circuits. To the best of our knowledge, no progress on this problem has been formally reported in the literature for general computational models, although we observe that early work of Chandra, Stockmeyer, and Vishkin implies the existence of nonuniform unbounded fanin circuits of t O(1) t size and O ( ) depth, for time t Turing machines. log log n We give an algorithmic sizedepth tradeoff for parallelizing time t random access Turing machines, a model at least as powerful as logarithmic cost RAMs. Our parallel simulation yields logspaceuniform t O(1) size, O(t / log t) depth Boolean circuits having semiunbounded fanin gates. In fact, for appropriate d, uniform t O(1) 2 O(t/d) size circuits of depth O(d) can simulate time t. One corollary is that any logcost time t RAM can be simulated by a logcost CRCW PRAM using t O(1) processors and O(t/log t) time. This is a major improvement over previous parallel speedups, which could only guarantee an Ω(log t) speedup with an exponential number of processors.