Results 1  10
of
32
Cilk: An Efficient Multithreaded Runtime System
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "cri ..."
Abstract

Cited by 687 (38 self)
 Add to MetaCart
(Show Context)
Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and criticalpath length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (wellstructured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 231 (10 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
A new approach to the minimum cut problem
 Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract

Cited by 112 (9 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
An analysis of dagconsistent distributed sharedmemory algorithms
 in Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1996
"... In this paper, we analyze the performance of parallel multithreaded algorithms that use dagconsistent distributed shared memory. Specifically, we analyze execution time, page faults, and space requirements for multithreaded algorithms executed by a FP(C) workstealing thread scheduler and the BACKE ..."
Abstract

Cited by 101 (19 self)
 Add to MetaCart
(Show Context)
In this paper, we analyze the performance of parallel multithreaded algorithms that use dagconsistent distributed shared memory. Specifically, we analyze execution time, page faults, and space requirements for multithreaded algorithms executed by a FP(C) workstealing thread scheduler and the BACKER algorithm for maintaining dag consistency. We prove that if the accesses to the backing store are random and independent (the BACKER algorithm actually uses hashing), the expected execution time TP(C)of a “fully strict” multithreaded computation on P processors, each with a LRU cache of C pages, is O(T1(C)=P+mCT∞), where T1(C)is the total work of the computation including page faults, T ∞ is its criticalpath length excluding page faults, and m is the minimum page transfer time. As
Minimum Cuts in NearLinear Time
, 1999
"... We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semiduality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carlo) algorit ..."
Abstract

Cited by 86 (12 self)
 Add to MetaCart
We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semiduality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carlo) algorithm that finds a minimum cut in an medge, nvertex graph with high probability in O(m log³ n) time. We also give a simpler randomized algorithm that finds all minimum cuts with high probability in O(n² log n) time. This variant has an optimal RNC parallelization. Both variants improve on the previous best time bound of O(n² log³ n). Other applications of the treepacking approach are new, nearly tight bounds on the number of near minimum cuts a graph may have and a new data structure for representing them in a spaceefficient manner.
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
Abstract state machines capture parallel algorithms
 ACM Transactions on Computational Logic
, 2003
"... Abstract We give an axiomatic description of parallel, synchronous algorithms. Our main result is that every such algorithm can be simulated, step for step, by an abstract state machine with a background that provides for multisets. \Lambda ..."
Abstract

Cited by 61 (24 self)
 Add to MetaCart
Abstract We give an axiomatic description of parallel, synchronous algorithms. Our main result is that every such algorithm can be simulated, step for step, by an abstract state machine with a background that provides for multisets. \Lambda
Logic Programs and Connectionist Networks
 Journal of Applied Logic
, 2004
"... One facet of the question of integration of Logic and Connectionist Systems, and how these can complement each other, concerns the points of contact, in terms of semantics, between neural networks and logic programs. In this paper, we show that certain semantic operators for propositional logic p ..."
Abstract

Cited by 59 (19 self)
 Add to MetaCart
(Show Context)
One facet of the question of integration of Logic and Connectionist Systems, and how these can complement each other, concerns the points of contact, in terms of semantics, between neural networks and logic programs. In this paper, we show that certain semantic operators for propositional logic programs can be computed by feedforward connectionist networks, and that the same semantic operators for firstorder normal logic programs can be approximated by feedforward connectionist networks. Turning the networks into recurrent ones allows one also to approximate the models associated with the semantic operators. Our methods depend on a wellknown theorem of Funahashi, and necessitate the study of when Funahasi's theorem can be applied, and also the study of what means of approximation are appropriate and significant.
Efficient Inference of Object Types
, 1995
"... Abadi and Cardelli have recently investigated a calculus of objects [2]. The calculus supports a key feature of objectoriented languages: an object can be emulated by another object that has more refined methods. Abadi and Cardelli presented four firstorder type systems for the calculus. The simpl ..."
Abstract

Cited by 59 (6 self)
 Add to MetaCart
(Show Context)
Abadi and Cardelli have recently investigated a calculus of objects [2]. The calculus supports a key feature of objectoriented languages: an object can be emulated by another object that has more refined methods. Abadi and Cardelli presented four firstorder type systems for the calculus. The simplest one is based on finite types and no subtyping, and the most powerful one has both recursive types and subtyping. Open until now is the question of type inference, and in the presence of subtyping "the absence of minimum typings poses practical problems for type inference" [2]. In this paper...