Results 1 - 10
of
60
Parallel Numerical Linear Algebra
- Society for Industrial and Applied Mathematics
, 1997
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract
-
Cited by 418 (24 self)
- Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, the singular value decomposition, and generalizations of these to two matrices. We consider dense, band and sparse matrices.
The NP-completeness column: an ongoing guide
- Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co ..."
Abstract
-
Cited by 164 (0 self)
- Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, cross-references will be given to that book and the list of problems (NP-complete and harder) presented there. Readers who have results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.) or open problems they would like publicized, should
Models of Computation -- Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Optimal Communication Algorithms for Hypercubes
- Journal of Parallel and Distributed Computing
, 1991
"... We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the pr ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor. @ 1991 Academic Press,lnc. 263 to the coordinates of a node of the d-cube is referred to as the identity number of the node. We recall that a hypercube of any dimension can be constructed by connecting lowerdimensional cubes, starting with a l-cube. In particular, we can start with two (d-I)-dimensional cubes and introduce a link connecting each pair of nodes with the same identity number (see, e.g., [I, Sect. 1.3]). This constructs ad-cube with the identity number of each node obtained by adding a leading 0 or a leading I to its previous identity, depending on whether the node belongs to the first (d-I)-dimensional cube or the second (see Fig. I). When confusion cannot arise, we refer to a d-cube node interchangeably in terms of its identity number (a binary string of length d) and in terms of the decimal representation of its identity number. Thus,
Communication Lower Bounds for Distributed-Memory Matrix Multiplication
, 2004
"... this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ned with the condition 0 c 1 g(n) f(n). The set o(g(n)) consists of functions f(n) such that for any c 2 > 0 there exists a constant n0 > 0 such that 0 f(n) c 2 g(n) for all n n0
Programming a Hypercube Multicomputer
, 1988
"... We describe those features of distributed memory MIMD hypercube multicomputers that are necessary to obtain efficient programs. Several examples are developed. These illustrate the effectiveness of different programming strategies. ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
We describe those features of distributed memory MIMD hypercube multicomputers that are necessary to obtain efficient programs. Several examples are developed. These illustrate the effectiveness of different programming strategies.
Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer
- Parallel Computing
, 1992
"... Some level--2 and level--3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM--200 are described. No assumption is made on the shape or size of the operands. For matrix--matrix multiplication, both the nonsystolic and the systolic algo ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
Some level--2 and level--3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM--200 are described. No assumption is made on the shape or size of the operands. For matrix--matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in--place is described in detail. We show that a level--3 DBLAS yields better performance than a level--2 DBLAS. On the Connection Machine system CM--200, blocking yields a performance improvement by a factor of up to three over level--2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms. We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix--matrix multiplication. Furthermore, it is sh...
Sparse Matrix Computations on Parallel Processor Arrays
- SIAM J. SCI. COMPUT
, 1992
"... We investigate the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrix-vector and matrix-matrix products on a rectangular processor array with efficient communications along its rows and columns we require that the non ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We investigate the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrix-vector and matrix-matrix products on a rectangular processor array with efficient communications along its rows and columns we require that the nonzero elements of each matrix row or column be distributed among the processors located within the same array row or column, respectively. We construct randomized packing algorithms with such properties, and we prove that with high probability they produce well-balanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure. Then we design basic matrix-vector multiplication routines with fully parallel interprocessor communications and intraprocessor gather and scatter operations. Their efficiency is demonstrated on the 16,384-processor MasPar computer.
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and fine-tuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and fine-tuning techniques that we developed for the problem of finding connected components in parallel; many of the fine-tuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Efficient Parallel Algorithms for Computing All Pair Shortest Paths in Directed Graphs
, 1997
"... . We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we a ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
. We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we are able to achieve time complexity O(n 3 /p + log n) using p processors. Key Words. Analysis of algorithms, Design of algorithms, Parallel algorithms, Graph algorithms, Shortest path. 1. Introduction. A number of known algorithms compute the all pair shortest paths in graphs and digraphs with n vertices by using O(n 3 ) operations [D], [Fl], [J]. All these algorithms, however, use at least n-1 recursive steps in the worst case and thus require at least the order of n time in their parallel implementation, even if the number of available processors is not bounded. O(n) time and n 2 processor bounds can indeed be achieved, for instance, in the straightforward parallelization of th...

