Results 1  10
of
68
Applied Numerical Linear Algebra
 Society for Industrial and Applied Mathematics
, 1997
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate ..."
Abstract

Cited by 532 (26 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
The NPcompleteness column: an ongoing guide
 Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co ..."
Abstract

Cited by 188 (0 self)
 Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, crossreferences will be given to that book and the list of problems (NPcomplete and harder) presented there. Readers who have results they would like mentioned (NPhardness, PSPACEhardness, polynomialtimesolvability, etc.) or open problems they would like publicized, should
Models of Computation  Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Communication Lower Bounds for DistributedMemory Matrix Multiplication
, 2004
"... this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ned with the condition 0 c 1 g(n) f(n). The set o(g(n)) consists of functions f(n) such that for any c 2 > 0 there exists a constant n0 > 0 such that 0 f(n) c 2 g(n) for all n n0
Optimal Communication Algorithms for Hypercubes
 Journal of Parallel and Distributed Computing
, 1991
"... We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the pr ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor. @ 1991 Academic Press,lnc. 263 to the coordinates of a node of the dcube is referred to as the identity number of the node. We recall that a hypercube of any dimension can be constructed by connecting lowerdimensional cubes, starting with a lcube. In particular, we can start with two (dI)dimensional cubes and introduce a link connecting each pair of nodes with the same identity number (see, e.g., [I, Sect. 1.3]). This constructs adcube with the identity number of each node obtained by adding a leading 0 or a leading I to its previous identity, depending on whether the node belongs to the first (dI)dimensional cube or the second (see Fig. I). When confusion cannot arise, we refer to a dcube node interchangeably in terms of its identity number (a binary string of length d) and in terms of the decimal representation of its identity number. Thus,
Programming a Hypercube Multicomputer
, 1988
"... We describe those features of distributed memory MIMD hypercube multicomputers that are necessary to obtain efficient programs. Several examples are developed. These illustrate the effectiveness of different programming strategies. ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
We describe those features of distributed memory MIMD hypercube multicomputers that are necessary to obtain efficient programs. Several examples are developed. These illustrate the effectiveness of different programming strategies.
Sparse Matrix Computations on Parallel Processor Arrays
 SIAM J. SCI. COMPUT
, 1992
"... We investigate the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrixvector and matrixmatrix products on a rectangular processor array with efficient communications along its rows and columns we require that the non ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
We investigate the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrixvector and matrixmatrix products on a rectangular processor array with efficient communications along its rows and columns we require that the nonzero elements of each matrix row or column be distributed among the processors located within the same array row or column, respectively. We construct randomized packing algorithms with such properties, and we prove that with high probability they produce wellbalanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure. Then we design basic matrixvector multiplication routines with fully parallel interprocessor communications and intraprocessor gather and scatter operations. Their efficiency is demonstrated on the 16,384processor MasPar computer.
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and finetuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and finetuning techniques that we developed for the problem of finding connected components in parallel; many of the finetuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Efficient Parallel Algorithms for Computing All Pair Shortest Paths in Directed Graphs
, 1997
"... . We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
. We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we are able to achieve time complexity O(n 3 /p + log n) using p processors. Key Words. Analysis of algorithms, Design of algorithms, Parallel algorithms, Graph algorithms, Shortest path. 1. Introduction. A number of known algorithms compute the all pair shortest paths in graphs and digraphs with n vertices by using O(n 3 ) operations [D], [Fl], [J]. All these algorithms, however, use at least n1 recursive steps in the worst case and thus require at least the order of n time in their parallel implementation, even if the number of available processors is not bounded. O(n) time and n 2 processor bounds can indeed be achieved, for instance, in the straightforward parallelization of th...
Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer
 Parallel Computing
, 1992
"... Some level2 and level3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM200 are described. No assumption is made on the shape or size of the operands. For matrixmatrix multiplication, both the nonsystolic and the systolic algo ..."
Abstract

Cited by 24 (12 self)
 Add to MetaCart
Some level2 and level3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM200 are described. No assumption is made on the shape or size of the operands. For matrixmatrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix inplace is described in detail. We show that a level3 DBLAS yields better performance than a level2 DBLAS. On the Connection Machine system CM200, blocking yields a performance improvement by a factor of up to three over level2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms. We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrixmatrix multiplication. Furthermore, it is sh...