Results 1  10
of
12
Implementing an Irregular Application on a Distributed Memory Multiprocessor
 In Proceedings of the Fourth ACM/SIGPLAN Symposium on Principles and Practices of Parallel Programming
, 1993
"... Parallelism with irregular patterns of data, communication and computation is hard to manage efficiently. In this paper we present a case study of the Gröbner basis problem, a symbolic algebra application. We developed an efficient parallel implementation using the following techniques. First, a seq ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
(Show Context)
Parallelism with irregular patterns of data, communication and computation is hard to manage efficiently. In this paper we present a case study of the Gröbner basis problem, a symbolic algebra application. We developed an efficient parallel implementation using the following techniques. First, a sequential algorithm was rewritten in a transition axiom style, in which computation proceeds by nondeterministic invocations of guarded statements at multiple processors. Next, the algebraic properties of the problem were studied to modify the algorithm to ensure correctness in spite of locally inconsistent views of the shared data structures. This was used to design data structures with very little overhead for maintaining consistency. Finally, an applicationspecific scheduler was designed and tuned to get good performance. Our distributed memory implementation achieves impressive speedups.
Distributing Equational Theorem Proving
, 1993
"... In this paper we show that distributing the theorem proving task to several experts is a promising idea. We describe the team work method which allows the experts to compete for a while and then to cooperate. In the cooperation phase the best results derived in the competition phase are collected an ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
In this paper we show that distributing the theorem proving task to several experts is a promising idea. We describe the team work method which allows the experts to compete for a while and then to cooperate. In the cooperation phase the best results derived in the competition phase are collected and the less important results are forgotten. We describe some useful experts and explain in detail how they work together. We establish fairness criteria and so prove the distributed system to be both, complete and correct. We have implemented our system and show by nontrivial examples that drastical time speedups are possible for a cooperating team of experts compared to the time needed by the best expert in the team.
The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View
, 2008
"... Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.
A Taxonomy of Parallel Strategies for Deduction
 Annals of Mathematics and Artificial Intelligence
, 1999
"... This paper presents a taxonomy of parallel theoremproving methods based on the control of search (e.g., masterslaves versus peer processes), the granularity of parallelism (e.g., fine, medium and coarse grain) and the nature of the method (e.g., orderingbased versus subgoalreduction) . We anal ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
This paper presents a taxonomy of parallel theoremproving methods based on the control of search (e.g., masterslaves versus peer processes), the granularity of parallelism (e.g., fine, medium and coarse grain) and the nature of the method (e.g., orderingbased versus subgoalreduction) . We analyze how the di#erent approaches to parallelization a#ect the control of search: while fine and mediumgrain methods, as well as masterslaves methods, generally do not modify the sequential search plan, parallelsearch methods may combine sequential search plans (multisearch) or extend the search plan with the capability of subdividing the search space (distributed search). Precisely because the search plan is modified, the latter methods may produce radically di#erent searches than their sequential base, as exemplified by the first distributed proof of the Robbins theorem generated by the Modified ClauseDi#usion prover Peersmcd. An overview of the state of the field and directions...
Multipol: A Distributed Data Structure Library
 in Fifth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming
, 1995
"... Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures, Multipol, which includes parallel versions of classic data structures such as trees, sets, lists, graphs, and queues. The library is built on a portable runtime layer that provides basic communication, synchronization, and caching. The data structures address the classic tradeoff between locality and load balance through a combination of replication, partitioning, and dynamic caching. To tolerate remote communication latencies, some of the operations are split into a separate initiation and completion phase, allowing for computation and communication overlap at the library interface level. This leads to a form of relaxed consistency semantics for the data types. In this paper we give an o...
Distributed data structures and algorithms for Gröbner basis computation
 Lisp and Symbolic Computation
, 1994
"... We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointerbased structures and the computations on the structures have unpre ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointerbased structures and the computations on the structures have unpredictable duration. The algorithm is presented as a series of refinements on a transition rule program, in which computation proceeds by nondeterministic invocations of guarded commands. Two key data structures, a set and a priority queue, are distributed across processors in the parallel algorithm. The data structures are designed for high throughput and latency tolerance, as appropriate for distributed memory machines. The programming style represents a compromise between sharedmemory and messagepassing models. The distributed nature of the data structures shows through their interface in that the semantics are weaker than with shared atomic objects, but they still provide a shared abstraction that can be used for reasoning about program correctness. In the data structure design there is a classic tradeoff between locality and load balance. We argue that this is best solved by designing scheduling structures in tandem with the state data structures, since the decision to replicate or partition state affects the overhead of dynamically moving tasks.
On the Correctness of a Distributed Memory Gröbner Basis Algorithm
 In Rewriting Techniques and Applications
, 1992
"... We present an asynchronous MIMD algorithm for Grobner basis computation. The algorithm is based on the wellknown sequential algorithm of Buchberger. Two factors make the correctness of our algorithm nontrivial: the nondeterminism that is inherent with asynchronous parallelism, and the distributio ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We present an asynchronous MIMD algorithm for Grobner basis computation. The algorithm is based on the wellknown sequential algorithm of Buchberger. Two factors make the correctness of our algorithm nontrivial: the nondeterminism that is inherent with asynchronous parallelism, and the distribution of data structures which leads to inconsistent views of the global state of the system. We demonstrate that by describing the algorithm as a nondeterministic sequential algorithm, and presenting the optimized parallel algorithm through a series of refinements to that algorithm, the algorithm is easier to understand and the correctness proof becomes manageable. The proof does, however, rely on algebraic properties of the polynomials in the computation, and does not follow directly from the proof of Buchberger's algorithm.
DLP: A Paradigm for Parallel Interactive Theorem Proving
, 1996
"... A new paradigm for parallel interactive theorem proving is advocated using DLP, a distributed and parallel version of LP, the Larch Prover. The rewriterule based parallel prover runs on a network of workstations. The amount and nature of parallelism are under explicit user control, unlike other para ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A new paradigm for parallel interactive theorem proving is advocated using DLP, a distributed and parallel version of LP, the Larch Prover. The rewriterule based parallel prover runs on a network of workstations. The amount and nature of parallelism are under explicit user control, unlike other parallel theorem provers in which parallelism is hidden from the user. The main objective is to exploit parallelism for enhancing user productivity in finding proofs of conjectures by induction and other firstorder inference methods. The user is encouraged to try different combinations of highlevel inference steps automatically and in parallel, leading to multiple proof attempts. While some parallel attempts compete, others cooperate by doing subparts of a problem. When no attempt leads to a proof, the user gets a global view of all attempts on the conjecture with the theorem prover generating useful feedback. A parallel interface provides mechanisms for managing multiple proof attempts. The...
Parallel Data Structures for Symbolic Computation
 In Workshop on Parallel Symbolic Languages and Systems
, 1995
"... Symbolic applications often require dynamic irregular data structures, such as linked lists, unbalanced trees, and graphs, and they exhibit unpredictable computational patterns that lead to asynchronous communication and load imbalance when parallelized. In this paper we describe several symbolic ap ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Symbolic applications often require dynamic irregular data structures, such as linked lists, unbalanced trees, and graphs, and they exhibit unpredictable computational patterns that lead to asynchronous communication and load imbalance when parallelized. In this paper we describe several symbolic applications and their parallelizations. The main problem in parallelization of each application was to replace the primary data structures with parallel versions that allow for high throughput, low latency access. In each case there are two problems to be solved: load balancing the parallel computation and sharing information about the solution as it is being constructed. The first problem is typically solved using a scheduling data structure, a stack, queue, or priority queue in sequential programs. The difficulty in parallelizing these structure is the tradeoff between locality and load balancing: aggressive load balancing can lead to poor locality. The second problem of storing the soluti...
Research Agenda Based on the Berkeley View
, 2008
"... Copyright © 2008, by the author(s). ..."
(Show Context)