Results 1 - 10
of
187
The SPLASH-2 programs: Characterization and methodological considerations
- INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1995
"... The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental propertie ..."
Abstract
-
Cited by 962 (13 self)
- Add to MetaCart
The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract
-
Cited by 471 (14 self)
- Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
A new approach to the maximum flow problem
- Journal of the ACM
, 1988
"... Abstract. All previously known efftcient maximum-flow algorithms work by finding augmenting paths, either one path at a time (as in the original Ford and Fulkerson algorithm) or all shortest-length augmenting paths at once (using the layered network approach of Dinic). An alternative method based on ..."
Abstract
-
Cited by 391 (27 self)
- Add to MetaCart
Abstract. All previously known efftcient maximum-flow algorithms work by finding augmenting paths, either one path at a time (as in the original Ford and Fulkerson algorithm) or all shortest-length augmenting paths at once (using the layered network approach of Dinic). An alternative method based on the preflow concept of Karzanov is introduced. A preflow is like a flow, except that the total amount flowing into a vertex is allowed to exceed the total amount flowing out. The method maintains a preflow in the original network and pushes local flow excess toward the sink along what are estimated to be shortest paths. The algorithm and its analysis are simple and intuitive, yet the algorithm runs as fast as any other known method on dense. graphs, achieving an O(n)) time bound on an n-vertex graph. By incorporating the dynamic tree data structure of Sleator and Tarjan, we obtain a version of the algorithm running in O(nm log(n’/m)) time on an n-vertex, m-edge graph. This is as fast as any known method for any graph density and faster on graphs of moderate density. The algorithm also admits efticient distributed and parallel implementations. A parallel implementation running in O(n’log n) time using n processors and O(m) space is obtained. This time bound matches that of the Shiloach-Vishkin algorithm, which also uses n processors but requires O(n’) space.
LogGP: Incorporating Long Messages into the LogP Model - One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract
-
Cited by 204 (1 self)
- Add to MetaCart
We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
The NP-completeness column: an ongoing guide
- Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co ..."
Abstract
-
Cited by 164 (0 self)
- Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, cross-references will be given to that book and the list of problems (NP-complete and harder) presented there. Readers who have results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.) or open problems they would like publicized, should
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract
-
Cited by 163 (7 self)
- Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Scans as Primitive Parallel Operations
- IEEE Transactions on Computers
, 1987
"... In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the P-RAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimumspanning -tree algorithm, a line-drawing algorithm and a mergi...
Nesl: A Nested Data-Parallel Language
, 1990
"... This report describes Nesl, a strongly-typed, applicative, data-parallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dat ..."
Abstract
-
Cited by 127 (4 self)
- Add to MetaCart
This report describes Nesl, a strongly-typed, applicative, data-parallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on vectors, including a mechanism for applying any function over the elements of a vector in parallel, and a broad set of parallel functions that manipulate vectors. Nesl fully supports nested vectors and nested parallelism---the ability to take a parallel function and then apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph or sparse matrix algorithms. Nesl also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM...
NESL: A nested data-parallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Data-parallel, parallel algorithms, supe ..."
Abstract
-
Cited by 87 (7 self)
- Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Data-parallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a strongly-typed, applicative, data-parallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelism—the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divide-and-conquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on
Modelling Knowledge and Action in Distributed Systems
- Distributed Computing
, 1988
"... : We present a formal model that captures the subtle interaction between knowledge and action in distributed systems. We view a distributed system as a set of runs, where a run is a function from time to global states and a global state is a tuple consisting of an environment state and a local state ..."
Abstract
-
Cited by 82 (28 self)
- Add to MetaCart
: We present a formal model that captures the subtle interaction between knowledge and action in distributed systems. We view a distributed system as a set of runs, where a run is a function from time to global states and a global state is a tuple consisting of an environment state and a local state for each process in the system. This model is a generalization of those used in many previous papers. Actions in this model are associated with functions from global states to global states. A protocol is a function from local states to actions. We extend the standard notion of a protocol by defining knowledge-based protocols, ones in which a process' actions may depend explicitly on its knowledge. Knowledge-based protocols provide a natural way of describing how actions should take place in a distributed system. Finally, we show how the notion of one protocol implementing another can be captured in our model. Some material in this paper appeared in preliminary form in [HF85]. An abridge...

