Results 1  10
of
63
Combinatorial preconditioners for sparse, symmetric, diagonally dominant linear systems
, 1996
"... ..."
Efficient sparse LU factorization with partial pivoting on distributed memory architectures
 IEEE Trans. Parallel and Distributed Systems
, 1998
"... Abstractâ€”A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
Abstractâ€”A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fillin structures during the elimination process. This paper presents an approach called S * for parallelizing this problem on distributed memory machines. The S * approach adopts static symbolic factorization to avoid runtime control overhead, incorporates 2D L/U supernode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on CrayT3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machines [8].
Automatic Selection of Dynamic Data Partitioning Schemes for DistributedMemory Multicomputers
 in Proceedings of the 8th Workshop on Languages and Compilers for Parallel Computing
, 1995
"... . For distributedmemory multicomputers such as the Intel Paragon, the IBM SP1/SP2, the NCUBE/2, and the Thinking Machines CM5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
. For distributedmemory multicomputers such as the Intel Paragon, the IBM SP1/SP2, the NCUBE/2, and the Thinking Machines CM5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Several researchers have proposed systems that are able to produce data distributions that remain in effect for the entire execution of an application. For complex programs, however, such static data distributions may be insufficient to obtain acceptable performance. The selection of distributions that dynamically change over the course of a program's execution adds another dimension to the data partitioning problem. In this paper, we present a technique that can be used to automatically determine which partitionings are most beneficial over specific sections of a program while taking into a...
Multigrid Acceleration Techniques and Applications to the Numerical Solution of Partial Differential Equations
, 1997
"... ..."
Domain Decomposition and MultiLevel Type Techniques for General Sparse Linear Systems
, 1998
"... Domaindecomposition and multilevel techniques are often formulated for linear systems that arise from the solution of elliptictype Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting ..."
Abstract

Cited by 17 (16 self)
 Add to MetaCart
Domaindecomposition and multilevel techniques are often formulated for linear systems that arise from the solution of elliptictype Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting common approach used to derive successful preconditioners is to resort to Schur complements. In particular, we discuss a multilevel domain decompositiontype algorithm for iterative solution of large sparse linear systems based on independent subsets of nodes. We also discuss a Schur complement technique that utilizes incomplete LU factorizations of local matrices. Key words: Schur complement techniques; Incomplete LU factorization; Schwarz iterations; Multielimination; Multilevel ILU preconditioners; Krylov subspace methods. 1 Introduction A recent trend in parallel preconditioning techniques for general sparse linear systems is to exploit ideas from domain decomposition concepts an...
Techniques to Overlap Computation and Communication in Irregular Iterative Applications
 the Proceedings of the International Conference on Supercomputing
, 1994
"... There are many applications in CFD and structural analysis that can be more accurately modeled using unstructured grids. Parallelization of implicit methods for unstructured grids is a difficult and important problem. This paper deals with coloring techniques to overlap computation and communication ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
There are many applications in CFD and structural analysis that can be more accurately modeled using unstructured grids. Parallelization of implicit methods for unstructured grids is a difficult and important problem. This paper deals with coloring techniques to overlap computation and communication during the solution of implicit methods on message passing distributed memory multicomputers. An evaluation of coloring techniques for partitioned unstructured grids is first presented. Results show the importance of using partitioning information during coloring. It is next shown that overlapping computation and communication can be formalized as a generalized coloring problem. Modified coloring algorithms are used for this purpose. The PARTI library [4] has been extended to support nonblocking gatherscatter operations and used in conjunction with these algorithms. Practicality issues are evaluated with experimental results on an Intel Paragon multicomputer. 1 Introduction There are man...
Computational and numerical methods for bioelectric field problems
 Critical Reviews in BioMedical Engineering
, 1997
"... Fundamental problems in electrophysiology can be studied by computationally modeling and simulating the associated microscopic and macroscopic bioelectric fields. To study such fields computationally, researchers have developed a number of numerical and computational techniques. Advances in computer ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
Fundamental problems in electrophysiology can be studied by computationally modeling and simulating the associated microscopic and macroscopic bioelectric fields. To study such fields computationally, researchers have developed a number of numerical and computational techniques. Advances in computer architectures have allowed researchers to model increasingly complex biophysical system. Modeling such systems requires a researcher to apply a wide variety of computational and numerical methods to describe the underlying physics and physiology of the associated threedimensional geometries. Issues naturally arise as to the accuracy and efficiency of such methods. In this paper we review computational and numerical methods for solving bioelectric field problems. The motivating applications represent a class of bioelectric field problems that arise in electrocardiography and
Parallel Incomplete Cholesky Preconditioners based on the NonOverlapping Data Distribution
 Parallel Computing
, 1997
"... The paper analyses various parallel incomplete factorizations based on the nonoverlapping domain decomposition. The general framework is applied to the investigation of the preconditioning step in cglike methods. Under certain conditions imposed on the finite element mesh, all matrix and vector typ ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
The paper analyses various parallel incomplete factorizations based on the nonoverlapping domain decomposition. The general framework is applied to the investigation of the preconditioning step in cglike methods. Under certain conditions imposed on the finite element mesh, all matrix and vector types given by the special data distribution can be used in the matrixbyvector multiplications. Not only the wellknown domain decomposition preconditioners fit into the concept but also parallelized global incomplete factorizations are feasible. Additionally, those global incomplete factorizations can be used as smoothers in parallel multigrid methods. Numerical results on a parallel machine with distributed memory are presented. Keywords : Parallel Iterative Solvers, Incomplete Factorization, Preconditioning, Domain Decomposition, Distributed Memory, Finite Element Method. 1 Introduction A lot of parallel algorithms have been developed for matrix factorizations based on disjoint data distr...
Efficient Approximate Solution of Sparse Linear Systems
, 1998
"... We consider the problem of approximate solution of of a linear system Ax = b over the reals, such that HAE  bll _ ellbll, for a given ,0 e 1. This is one of the most fundamental of all computational problems. Let (A) = IlARHA111 be the condition number of the n x n input matrix A. Sparse, Diagonal ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We consider the problem of approximate solution of of a linear system Ax = b over the reals, such that HAE  bll _ ellbll, for a given ,0 e 1. This is one of the most fundamental of all computational problems. Let (A) = IlARHA111 be the condition number of the n x n input matrix A. Sparse, Diagonally Dominant (DD) linear systems appear very frequently in the solution of linear systems associated with PDEs and stochastic systems, and generally have polynomial condition number. While there is a vast literature on methods for approximate solution of sparse DD linear systems, most of the results are empirical, and to date there are no known proven linear bounds on the complexity of this problem. Using iterative algorithms, and building on the work of Valdya [1] and Gretaban et al. [24] we provide the best known sequential work bounds for the solution of a number of major classes of DD sparse linear systems. Let r = log((A)/e). The sparsity graph of A is a graph whose nodes are the indices and whose edges represent pairs of indices of A with nonzero entries. The following results hold for a DD matrix A with nonzero offdiagonal entries of bounded magnitude: (1) if A has a sparsity graph which is a regular ddimensional grid for constant d, then our work is O(n'2), (2) if A is a stochastic matrix with fixed s(n)separable graph as its sparsity graph, then our work is O((n + s(n)2)r). The following results hold for a DD matrix A with entries of unbounded magnitude: (3) if A is sparse (i.e., O(n) nonzeros), our work is less than O(n(r + 1ogn)) 1.5, (4) if A has a sparsity graph in a family of graphs with constant size forbidden graph minors (e.g., planar graphs), then our work is bounded by O(n(r+log n)l+(1)) in the case log n = o(log and O(n(r + 1ogn)) 1+(1) in the case log...
Acceleration of Fivepoint RedBlack GaussSeidel in Multigrid for Poisson Equation
, 1995
"... A new relaxation analysis and two acceleration schemes are proposed for the fivepoint RedBlack GaussSeidel smoothing in multigrid for solving two dimensional Poisson equation. For a multigrid V cycle, we discovered that underrelaxation is applicable to restriction half cycle and overrelaxation ..."
Abstract

Cited by 13 (12 self)
 Add to MetaCart
A new relaxation analysis and two acceleration schemes are proposed for the fivepoint RedBlack GaussSeidel smoothing in multigrid for solving two dimensional Poisson equation. For a multigrid V cycle, we discovered that underrelaxation is applicable to restriction half cycle and overrelaxation is applicable to interpolation half cycle. Numerical experiments using modified multigrid V cycle algorithms show that our simple acceleration schemes accelerate the convergence rate by as much as 34% with negligible cost. This result is contrary to the existing belief that SOR is not suitable for using as a smoother in multigrid for Poisson equation, because the gain in computational savings would not pay for the cost of implementing it. More important is the idea of employing different parameters to accelerate the reduction of low and high frequency errors separately. Our discovery offers a new way for SOR smoothing in multigrid.