## Fast and Efficient Parallel Solution of Sparse Linear Systems (1993)

Citations: | 14 - 6 self |

### BibTeX

@MISC{Pan93fastand,

author = {Victor Pan and John Reif},

title = {Fast and Efficient Parallel Solution of Sparse Linear Systems},

year = {1993}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

840 |
Matrix multiplication via arithmetic progressions
- Coppersmith, Winograd
- 1987
(Show Context)
Citation Context ...ta k matrices can be multiplied in O(k fi ) arithmetic operations and fi ! fl for some fl, then we may choose M(n)sn ! for some ! ! fl and for all n. The current best upper bound on fi and ! is 2.375 =-=[CW1]-=-; for surveys of the exciting history of the asymptotic acceleration of matrix multiplications, see also [Pan6], [Pan7], or the original works [Stra1] (the first and justly celebrated breakthrough in ... |

819 |
The Symmetric Eigenvalue Problem
- Parlett
- 1997
(Show Context)
Citation Context ...ally stable. The lemma follows from the observation that A \Gamma1 h+1 is a principal submatrix of A \Gamma1 h and from the interlacing property of the eigenvalues of a symmetric matrix (see [GoL] or =-=[Par]-=-). Lemma 6.2 The matrices A h+1 and X h are symmetric positive definite if the matrix A h is symmetric positive definite, and, furthermore, maxf(cond A h ) 2 , (cond X h ) 2 gs(cond A) 2 for all h. In... |

402 | Tarjan, A separator theorem for planar graphs
- Lipton, E
- 1979
(Show Context)
Citation Context ...k , so that its size is at most jR h;k j \Theta jR h;k j). By Fact 2.1 such inversions can be performed in O(log 2 n) time with N h M(s(n h ))s2 d\Gammah M(s(n h )) processors. The next lemma is from =-=[LT]-=-. Its proof is simplified in [PR1] for the case in which ff ? 1 2 , and we need only this case. Both proofs also show simple transformations of the respective s(n)-separator tree into s (n)-separator ... |

392 |
Gaussian elimination is not optimal
- Strassen
- 1969
(Show Context)
Citation Context ...the number of processors from P to dP=se by using s times as many parallel steps for any natural ssP ). By the upper bound of [Ch], obtained by the straightforward parallelization of the algorithm of =-=[Stra1]-=-, we may chose M(n)sn 2:81 . In [PR1] and [Pan10] we show that if k \Theta k matrices can be multiplied in O(k fi ) arithmetic operations and fi ! fl for some fl, then we may choose M(n)sn ! for some ... |

189 |
Generalized nested dissection
- Lipton, Rose, et al.
- 1979
(Show Context)
Citation Context ...the LINEAR-SOLVE instances for which our algorithm is effective. The nested dissection techniques were first proposed in [Ge] for grid graphs and were then extended to graphs with small separators in =-=[LRT]-=- (see also [R] and the excellent text [GeL] and an alternative version of the nested dissection algorithm in [GT]). Many applications to the sciences and engineering require the solution of such large... |

186 |
Nested dissection of a regular finite element mesh
- George
- 1973
(Show Context)
Citation Context ... and list some applications. In particular, in x3 we comment on the classes of the LINEAR-SOLVE instances for which our algorithm is effective. The nested dissection techniques were first proposed in =-=[Ge]-=- for grid graphs and were then extended to graphs with small separators in [LRT] (see also [R] and the excellent text [GeL] and an alternative version of the nested dissection algorithm in [GT]). Many... |

181 |
personal communication
- Miller
- 2006
(Show Context)
Citation Context ...TIME)sPROCESSOR = O(M(s(n)) log 2 n), both for computing the whole recursive factorization (4.3), (4.4) and for its proper stage of inverting X d\Gamma1 . Remark 5.3. As was noted by Gazit and Miller =-=[GM2]-=-, the recursive s(n)- factorization can be computed by using O(log 2 n log log n) time and M(s(n)) processors. Moreover, their approach can be extended to reach the bounds of O(log 2 n log(1=ffl)) tim... |

176 | Matching is as easy as matrix inversion
- Mulmuley, Vazirani, et al.
- 1987
(Show Context)
Citation Context ... other hand, theoretically, we may rely on the exact evaluation of the inverse of an n \Theta n matrix over rationals. This problem has interesting combinatorial applications (see [Lo], [GP1], [GP2], =-=[MVV]-=-). The known parallel algorithms for its solution use O(log 2 n) steps and n ff M(n) processors, where ff varies from 1 in [Cs] to 1 2 in [PrS] and to slightly less than 1 2 in [GP3]; furthermore, ff ... |

106 |
A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations
- ROSE
- 1972
(Show Context)
Citation Context ... instances for which our algorithm is effective. The nested dissection techniques were first proposed in [Ge] for grid graphs and were then extended to graphs with small separators in [LRT] (see also =-=[R]-=- and the excellent text [GeL] and an alternative version of the nested dissection algorithm in [GT]). Many applications to the sciences and engineering require the solution of such large linear system... |

102 | The Computational Complexity of Algebraic and Numeric Problems - BORODIN, MUNRO - 1975 |

94 |
Fast Algorithms for Solving Path Problems
- Tarjan
- 1981
(Show Context)
Citation Context ...eralizations of the nested dissection algorithms, including the recursive factorization techniques, are required in several important applications, particularly pathalgebra computation in graphs (see =-=[T1]-=-, [T2], [PR1], [PR4], and x3 below). Remark 1.1. Some readers may agree to sacrifice the generality of the results in order to simplify the graph techniques involved. Such readers may replace our Defi... |

92 |
Efficient Pa.rallel Solution of Linear Systems
- Pan, Reif
- 1985
(Show Context)
Citation Context ...on algorithms includes the papers [GHLN] and [ZG], which give a parallel time bound of PARALLEL SOLUTION OF SPARSE LINEAR SYSTEMS 1229 O( p n) for grid graphs. In the proceedings version of our paper =-=[PR1]-=-, nested dissection was applied for the first time to yield a numerically stable and processor efficient parallel algorithm for sparse LINEAR-SOLVE with poly-log time bounds, thus reaching (within pol... |

90 |
Fast parallel matrix inversion algorithms
- Csanky
- 1976
(Show Context)
Citation Context ...lem has interesting combinatorial applications (see [Lo], [GP1], [GP2], [MVV]). The known parallel algorithms for its solution use O(log 2 n) steps and n ff M(n) processors, where ff varies from 1 in =-=[Cs]-=- to 1 2 in [PrS] and to slightly less than 1 2 in [GP3]; furthermore, ff = 0 even for INVERT over the real matrices if we allow randomized Las Vegas algorithms, because of combining [KS] and [Pan11] (... |

72 |
A Unified Approach to Path Problems
- Tarjan
- 1981
(Show Context)
Citation Context ...ations of the nested dissection algorithms, including the recursive factorization techniques, are required in several important applications, particularly pathalgebra computation in graphs (see [T1], =-=[T2]-=-, [PR1], [PR4], and x3 below). Remark 1.1. Some readers may agree to sacrifice the generality of the results in order to simplify the graph techniques involved. Such readers may replace our Definition... |

57 |
Sparse Cholesky factorization on a local memory multiprocessor
- George, Heath, et al.
- 1988
(Show Context)
Citation Context ...parse linear systems and not a parallel algorithm for computing a dissection ordering. The subsequent literature on the parallel implementation of the nested dissection algorithms includes the papers =-=[GHLN]-=- and [ZG], which give a parallel time bound of PARALLEL SOLUTION OF SPARSE LINEAR SYSTEMS 1229 O( p n) for grid graphs. In the proceedings version of our paper [PR1], nested dissection was applied for... |

53 | Solution of Partial Differential Equations on Vector and Parallel Computers
- Ortega, Voigt
- 1985
(Show Context)
Citation Context ...ince many sets of separators must be eliminated in each parallel step. Linear-time parallel algorithms based on the nested dissection of grids were first described in [Li1] and [Ga]. The survey paper =-=[OV] gives ref-=-erences to early attempts at parallelizing the LINEARSOLVE algorithms by nested dissection. Here and hereafter, by "parallel nested dissection" we mean a parallel algorithm for solving spars... |

52 |
A new implementation of sparse Gaussian elimination
- Schreiber
- 1982
(Show Context)
Citation Context ...ldren that are the roots of the subtrees TG1 , TG2 of TG , where G j is the subgraph of G induced by the vertex set S [ V (j) for j = 1; 2. (Note that TG is not equivalent to the elimination trees of =-=[Schr]-=-, [Li2], and [GHLN] or to the separator trees of [GT], since the latter trees do not include the separator in the induced subgraphs.) The following definitions are equivalent to the usual ones, as, fo... |

50 | W.-H.: Computer Solution of Large Sparse Positive De nite Systems - George, Liu - 1981 |

47 | A Survey of Parallel Algorithms for Shared Memory Machines - Karp, Ramachandran - 1988 |

47 | The Symmetric EigenValue Problem; Prentice-Hall - Parlett - 1980 |

46 |
On the asymptotic complexity of matrix multiplication
- Coppersmith, Winograd
- 1982
(Show Context)
Citation Context ...otic acceleration of matrix multiplications, see also [Pan6], [Pan7], or the original works [Stra1] (the first and justly celebrated breakthrough in this area), [Pan1]--[Pan5], [BCLR], [Bi], [Scho ], =-=[CW2]-=-, [Stra2]. In practice, however, even for matrices of reasonably large sizes, we should only count on M(n) = n 3 = log n or, at best, on M(n) = O(n 2:78 ) because of the considerable overhead of the k... |

36 |
How can we speed up matrix multiplication
- Pan
- 1984
(Show Context)
Citation Context ...M(n)sn ! for some ! ! fl and for all n. The current best upper bound on fi and ! is 2.375 [CW1]; for surveys of the exciting history of the asymptotic acceleration of matrix multiplications, see also =-=[Pan6]-=-, [Pan7], or the original works [Stra1] (the first and justly celebrated breakthrough in this area), [Pan1]--[Pan5], [BCLR], [Bi], [Scho ], [CW2], [Stra2]. In practice, however, even for matrices of r... |

34 |
Tarjan, The analysis of a nested dissection algorithm
- Gilbert, E
- 1987
(Show Context)
Citation Context ...sed in [Ge] for grid graphs and were then extended to graphs with small separators in [LRT] (see also [R] and the excellent text [GeL] and an alternative version of the nested dissection algorithm in =-=[GT]-=-). Many applications to the sciences and engineering require the solution of such large linear systems; such systems are frequently so large that parallel implementation of the (generalized) nested di... |

30 | Parallel Algorithmic Techniques for Combinatorial Computation
- Eppstein, Galil
- 1988
(Show Context)
Citation Context ... LINEAR SYSTEMS 1228 1 Introduction Recently, it has become feasible to construct computer architectures with a large number of processors. We assume the parallel RAM machine model of [BGH] (see also =-=[EG]-=- and [KR]), where in each step each processor can perform a single addition, subtraction, multiplication, or division over the rationals. It is natural to study the efficient use of this parallelism f... |

30 | Partial and total matrix multiplication - Schönhage - 1981 |

27 |
How to Multiply Matrices Faster
- Pan
- 1984
(Show Context)
Citation Context ... for some ! ! fl and for all n. The current best upper bound on fi and ! is 2.375 [CW1]; for surveys of the exciting history of the asymptotic acceleration of matrix multiplications, see also [Pan6], =-=[Pan7]-=-, or the original works [Stra1] (the first and justly celebrated breakthrough in this area), [Pan1]--[Pan5], [BCLR], [Bi], [Scho ], [CW2], [Stra2]. In practice, however, even for matrices of reasonabl... |

27 |
Parametrization of Newton's iteration for computations with structured matrices and applications
- Pan
- 1992
(Show Context)
Citation Context ...rbitrary matrix filled with integers and such that log kAk = n O(1) (see proceedings papers [GP1, Part 1] and [Pan9], which cite and (partly) reproduce [Pan8], and see also its extensions in [Pan10], =-=[Pan11]-=-, and [KS]). In x3 we state our estimates for the complexity of sparse LINEAR-SOLVE by using the above estimates for the complexity of MULT and INVERT. Let us point out two alternatives. In the curren... |

26 |
A parallel algorithm for finding a separator in planar graphs
- Gazit, Miller
- 1987
(Show Context)
Citation Context ... graphs (see Figs. 1--5); similarly, such computation is simple for many finite element graphs (see [Ge]). The recent O(log 2 n)-time, n 1+" processor (for any " ? 0) randomized parallel alg=-=orithm of [GM1]-=- gives O( p n)-separator trees for all the planar graphs. Many very large sparse linear systems of algebraic equations found in practice, such as linear systems arising in the solution of two-dimensio... |

25 | An Improved Newton Iteration for the Generalized Inverse of a Matrix, with Applications
- Pan, Schreiber
- 1991
(Show Context)
Citation Context ...= kWk kW \Gamma1 k for a fixed matrix norm (this definition is invariant in l for all l-norms of matrices). Now we may recall the following estimate from [PR5] based on the algorithm of [Be] (compare =-=[PaS]): Fact 2.-=-1. The problem INVERT for an n \Theta n well-conditioned matrix A and for a positive " ! 1 such that log log(1=ffl) = O(log n) can be solved within error bound " by using O(log 2 n) parallel... |

23 |
A compact row storage scheme for Cholesky factors using elimination trees
- Liu
- 1986
(Show Context)
Citation Context ...the vertex elimination construction of [GT] for a 7 \Theta 7 grid graph in our Figs. 1--5 below (in this case, using the version of [GT], rather than ours, made our display simpler and more compact). =-=[Li2]-=- and [OR] describe two recent implementations of the parallel nested dissection algorithm on massively parallel SIMD machines. The first implementation is very general and applies to any s(n)-separata... |

22 |
On the problem of partitioning planar graphs
- Djidjev
- 1982
(Show Context)
Citation Context ...r graphs have a p 8n-separator family and that every n-vertex finite element graph with at most k boundary vertices in every element has a 4bk=2c p n-separator family. An improved construction due to =-=[D]-=- gives a p 6nseparator family for planar graphs. (Similar small separator bounds have also been derived by Djidjev for bounded genus graphs and for several other classes of graphs.) DEFINITION 3.2. Gi... |

18 | Nested dissection of a regular nite element mesh - George - 1973 |

17 |
Relative bilinear complexity and matrix multiplication. Journal fűr die reine und angewandte Mathematik (Crelles Journal
- Strassen
- 1987
(Show Context)
Citation Context ...celeration of matrix multiplications, see also [Pan6], [Pan7], or the original works [Stra1] (the first and justly celebrated breakthrough in this area), [Pan1]--[Pan5], [BCLR], [Bi], [Scho ], [CW2], =-=[Stra2]-=-. In practice, however, even for matrices of reasonably large sizes, we should only count on M(n) = n 3 = log n or, at best, on M(n) = O(n 2:78 ) because of the considerable overhead of the known asym... |

14 |
An improved parallel processor bound in fast matrix inversion
- Preparata, Sarwate
- 1978
(Show Context)
Citation Context ...ting combinatorial applications (see [Lo], [GP1], [GP2], [MVV]). The known parallel algorithms for its solution use O(log 2 n) steps and n ff M(n) processors, where ff varies from 1 in [Cs] to 1 2 in =-=[PrS]-=- and to slightly less than 1 2 in [GP3]; furthermore, ff = 0 even for INVERT over the real matrices if we allow randomized Las Vegas algorithms, because of combining [KS] and [Pan11] (see also [KP], [... |

13 | New fast algorithms for matrix operations - Pan - 1980 |

13 |
Fast and efficient solution of path algebra problems
- Pan, Reif
- 1989
(Show Context)
Citation Context ...present a version of the parallel nested dissection algorithm using O(s(n)) time and s(n) 2 processors (see the end of PARALLEL SOLUTION OF SPARSE LINEAR SYSTEMS 1230 x3). In the papers [PR2]--[PR4], =-=[PR6]-=-, [PR7] we extend our parallel nested dissection algorithm to the linear least-squares problem, to the linear programming problem, and to path-algebra computation in graphs; in all these papers the re... |

12 |
Parallel evaluation of the determinant and of the inverse of a matrix. Inform. Process. Lett
- Galil, Pan
(Show Context)
Citation Context ...o], [GP1], [GP2], [MVV]). The known parallel algorithms for its solution use O(log 2 n) steps and n ff M(n) processors, where ff varies from 1 in [Cs] to 1 2 in [PrS] and to slightly less than 1 2 in =-=[GP3]-=-; furthermore, ff = 0 even for INVERT over the real matrices if we allow randomized Las Vegas algorithms, because of combining [KS] and [Pan11] (see also [KP], [BP]), although the problem of numerical... |

12 | Solution of Partial Dierential Equations on Vector and Parallel Computers - Ortega, Voigt - 1985 |

11 |
Fast parallel matrix and
- BORODIN, GATHEN, et al.
- 1982
(Show Context)
Citation Context ...LUTION OF SPARSE LINEAR SYSTEMS 1228 1 Introduction Recently, it has become feasible to construct computer architectures with a large number of processors. We assume the parallel RAM machine model of =-=[BGH]-=- (see also [EG] and [KR]), where in each step each processor can perform a single addition, subtraction, multiplication, or division over the rationals. It is natural to study the efficient use of thi... |

11 | Processor e cient parallel solution of linear systems over an abstract eld - Kaltofen, Pan |

10 |
Parallel solution of sparse simultaneous linear equations
- Calahan
- 1973
(Show Context)
Citation Context ...dissection algorithms is necessary in order to make the solution feasible. (We recall some examples of such problems in x3.) Work on parallel sparse matrix algorithms can be traced back, at least, to =-=[Ca]-=-. The extension of the idea of nested dissection from the sequential to the parallel case was not immediate since many sets of separators must be eliminated in each parallel step. Linear-time parallel... |

10 |
Fast and efficient parallel algorithms for the exact inversion of integer matrices
- Pan
- 1985
(Show Context)
Citation Context ...sion (see [BM, p. 51] or [Pan7]), so that the processor bound, as well as the parallel and sequential time bounds attained this way, are optimal or nearly optimal (to within a factor of O(log n)). In =-=[Pan8]-=- the above results for dense matrices are extended to the exact evaluation of the inverse of A, of the determinant of A, and of all the coefficients of the characteristic polynomial of A in O(log 2 n)... |

9 | Strassen’s algorithm is not optimal; trilinear technique of aggregating, uniting and canceling for constructing fast algorithms for matrix operations - Pan - 1978 |

8 |
Improved processor bounds for combinatorial problems in RNC
- Galil, Pan
- 1988
(Show Context)
Citation Context ... On the other hand, theoretically, we may rely on the exact evaluation of the inverse of an n \Theta n matrix over rationals. This problem has interesting combinatorial applications (see [Lo], [GP1], =-=[GP2]-=-, [MVV]). The known parallel algorithms for its solution use O(log 2 n) steps and n ff M(n) processors, where ff varies from 1 in [Cs] to 1 2 in [PrS] and to slightly less than 1 2 in [GP3]; furthermo... |

8 | Fast and Efficient Parallel Solution of Dense Linear Systems
- Pan, Reif
- 1989
(Show Context)
Citation Context ...tioned if log cond W = O(log n), where cond W = kWk kW \Gamma1 k for a fixed matrix norm (this definition is invariant in l for all l-norms of matrices). Now we may recall the following estimate from =-=[PR5] base-=-d on the algorithm of [Be] (compare [PaS]): Fact 2.1. The problem INVERT for an n \Theta n well-conditioned matrix A and for a positive " ! 1 such that log log(1=ffl) = O(log n) can be solved wit... |

8 |
The parallel computation of minimum cost paths in graphs by stream contraction
- Pan, Reif
- 1991
(Show Context)
Citation Context ... a version of the parallel nested dissection algorithm using O(s(n)) time and s(n) 2 processors (see the end of PARALLEL SOLUTION OF SPARSE LINEAR SYSTEMS 1230 x3). In the papers [PR2]--[PR4], [PR6], =-=[PR7]-=- we extend our parallel nested dissection algorithm to the linear least-squares problem, to the linear programming problem, and to path-algebra computation in graphs; in all these papers the resulting... |

7 |
A note on iterative method for generalized inversion of matrices
- Ben-Israel
- 1966
(Show Context)
Citation Context ... where cond W = kWk kW \Gamma1 k for a fixed matrix norm (this definition is invariant in l for all l-norms of matrices). Now we may recall the following estimate from [PR5] based on the algorithm of =-=[Be] (compare -=-[PaS]): Fact 2.1. The problem INVERT for an n \Theta n well-conditioned matrix A and for a positive " ! 1 such that log log(1=ffl) = O(log n) can be solved within error bound " by using O(lo... |

7 |
Complexity of parallel matrix computations, Theor. Comput. Sci
- Pan
(Show Context)
Citation Context ... s times as many parallel steps for any natural ssP ). By the upper bound of [Ch], obtained by the straightforward parallelization of the algorithm of [Stra1], we may chose M(n)sn 2:81 . In [PR1] and =-=[Pan10]-=- we show that if k \Theta k matrices can be multiplied in O(k fi ) arithmetic operations and fi ! fl for some fl, then we may choose M(n)sn ! for some ! ! fl and for all n. The current best upper boun... |

6 |
The Solution of Mesh Equations on a Parallel Computer
- Liu
(Show Context)
Citation Context ...parallel case was not immediate since many sets of separators must be eliminated in each parallel step. Linear-time parallel algorithms based on the nested dissection of grids were first described in =-=[Li1] and [Ga].-=- The survey paper [OV] gives references to early attempts at parallelizing the LINEARSOLVE algorithms by nested dissection. Here and hereafter, by "parallel nested dissection" we mean a para... |

5 | Space and time efficient implementations of parallel nested dissection
- Armon, Reif
- 1992
(Show Context)
Citation Context ...g n) time and M(s(n)) processors. Moreover, their approach can be extended to reach the bounds of O(log 2 n log(1=ffl)) time and simultaneously O(M(s(n)) 1+ffl ) processors for a positive parameter 2 =-=[AR]-=-. One may try to improve the processor efficiency of the latter estimates by applying a relatively minor super effective slowdown of the computations [PP]. 6 Outline of the proof of the main theorem W... |