Results 1  10
of
16
Sparse Multifrontal Rank Revealing QR Factorization
 SIAM J. Matrix Anal. Appl
, 1995
"... We describe an algorithm to compute a rank revealing sparse QR factorization. We augment a basic sparse multifrontal QR factorization with an incremental condition estimator to provide an estimate of the least singular value and vector for each successive column of R. We remove a column from R as ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We describe an algorithm to compute a rank revealing sparse QR factorization. We augment a basic sparse multifrontal QR factorization with an incremental condition estimator to provide an estimate of the least singular value and vector for each successive column of R. We remove a column from R as soon as the condition estimate exceeds a tolerance, using the approximate singular vector to select a suitable column. Removing columns, or pivoting, requires a dynamic data structure and necessarily degrades sparsity. But most of the additional work fits naturally into the multifrontal factorization's use of efficient dense vector kernels, minimizing overall cost. Further, pivoting as soon as possible reduces the cost of pivot selection and data access. We present a theoretical analysis that shows that our use of approximate singular vectors does not degrade the quality of our rankrevealing factorization; we achieve an exponential bound like methods that use exact singular vectors. We prov...
Parallel Ordering Using Edge Contraction
 PARALLEL COMPUTING
, 1995
"... Computing a fillreducing ordering of a sparse matrix is a central problem in the solution of sparse linear systems using direct methods. In recent years, there has been significant research in developing a sparse direct solver suitable for messagepassing multiprocessors. However, computing the ord ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Computing a fillreducing ordering of a sparse matrix is a central problem in the solution of sparse linear systems using direct methods. In recent years, there has been significant research in developing a sparse direct solver suitable for messagepassing multiprocessors. However, computing the ordering step in parallel remains a challenge and there are very few methods available. This paper describes a new scheme called parallel contracted ordering which is a combination of a new parallel nested dissection heuristic and any serial ordering method. The new nested dissection heuristic called ShrinkSplit ND (SSND) is based on parallel graph contraction. For a system with N unknowns, the complexity of SSND is O( N P log P ) using P processors in a hypercube; the overall complexity is O( N P log N) when the serial ordering method chosen is graph exploration based nested dissection. We provide extensive empirical results on the quality of the ordering. We also report on the parallel...
CIMGS: An incomplete orthogonal factorization preconditioner
 SIAM J. Sci. Comput
, 1997
"... Abstract. A new preconditioner for symmetric positive definite systems is proposed, analyzed, and tested. The preconditioner, compressed incomplete modified Gram–Schmidt (CIMGS), is based on an incomplete orthogonal factorization. CIMGS is robust both theoretically and empirically, existing (in exac ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. A new preconditioner for symmetric positive definite systems is proposed, analyzed, and tested. The preconditioner, compressed incomplete modified Gram–Schmidt (CIMGS), is based on an incomplete orthogonal factorization. CIMGS is robust both theoretically and empirically, existing (in exact arithmetic) for any full rank matrix. Numerically it is more robust than an incomplete Cholesky factorization preconditioner (IC) and a complete Cholesky factorization of the normal equations. Theoretical results show that the CIMGS factorization has better backward error properties than complete Cholesky factorization. For symmetric positive definite Mmatrices, CIMGS induces a regular splitting and better estimates the complete Cholesky factor as the set of dropped positions gets smaller. CIMGS lies between complete Cholesky factorization and incomplete Cholesky factorization in its approximation properties. These theoretical properties usually hold numerically, even when the matrix is not an Mmatrix. When the drop set satisfies a mild and easily verified (or enforced) property, the upper triangular factor CIMGS generates is the same as that generated by incomplete Cholesky factorization. This allows the existence of the IC factorization to be guaranteed, based solely on the target sparsity pattern.
Task Scheduling in an Asynchronous Distributed Memory Multifrontal Solver
, 2002
"... We describe the improvements to the task scheduling for MUMPS, an asynchronous distributed memory direct solver for sparse linear systems. In the new approach, we determine, during the analysis of the matrix, candidate processes for the tasks that will be dynamically scheduled during the subsequent ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
We describe the improvements to the task scheduling for MUMPS, an asynchronous distributed memory direct solver for sparse linear systems. In the new approach, we determine, during the analysis of the matrix, candidate processes for the tasks that will be dynamically scheduled during the subsequent factorization. This approach signi cantly improves the scalability of the solver in terms of execution time and storage. By comparison with the previous version of MUMPS, we demonstrate the eciency and the scalability of the new algorithm on up to 512 processors. Our test cases include matrices from regular 3D grids and irregular ones from reallife applications.
Multiplerank modifications of a sparse Cholesky factorization
 SIAM J. Matrix Anal. Appl
, 2001
"... Abstract. Given a sparse symmetric positive definite matrix AA T and an associated sparse Cholesky factorization LDL T or LL T, we develop sparse techniques for updating the factorization after either adding a collection of columns to A or deleting a collection of columns from A. Our techniques are ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Abstract. Given a sparse symmetric positive definite matrix AA T and an associated sparse Cholesky factorization LDL T or LL T, we develop sparse techniques for updating the factorization after either adding a collection of columns to A or deleting a collection of columns from A. Our techniques are based on an analysis and manipulation of the underlying graph structure, using the framework developed in an earlier paper on rank1 modifications [T. A. Davis and W. W. Hager, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606–627]. Computationally, the multiplerank update has better memory traffic and executes much faster than an equivalent series of rank1 updates since the multiplerank update makes one pass through L computing the new entries, while a series of rank1 updates requires multiple passes through L.
Current Trends in Stochastic Programming Computation and Applications
, 1995
"... While decisions frequently have uncertain consequences, optimal decision models often replace those uncertainties with averages or best estimates. Limited computational capability may have motivated this practice in the past. Recent computational advances have, however, greatly expanded the range of ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
While decisions frequently have uncertain consequences, optimal decision models often replace those uncertainties with averages or best estimates. Limited computational capability may have motivated this practice in the past. Recent computational advances have, however, greatly expanded the range of stochastic programs, optimal decision models with explicit consideration of uncertainties. This paper describes basic methodology in stochastic programming, recent developments in computation, and some practical application examples.
Distributed Sparse Gaussian Elimination And Orthogonal Factorization
 LAPACK WORKING NOTE 64 (UT CS93203)
, 1993
"... We consider the solution of a linear system Ax = b on a distributed memory machine when the matrix A has full rank and is large, sparse and nonsymmetric. We use our Cartesian nested dissection algorithm to compute a fillreducing column ordering of the matrix. We develop algorithms that use the asso ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider the solution of a linear system Ax = b on a distributed memory machine when the matrix A has full rank and is large, sparse and nonsymmetric. We use our Cartesian nested dissection algorithm to compute a fillreducing column ordering of the matrix. We develop algorithms that use the associated separator tree to estimate the structure of the factor and to distribute and perform numeric computations. When the matrix is nonsymmetric but square, the numeric computations involve Gaussian elimination with row pivoting; when the matrix is overdetermined, roworiented Householder transforms are applied to compute the triangular factor of an orthogonal factorization. We compare the fill incurred by our approach to that incurred by well known sequential methods and report on the performance of our implementation on the Intel iPSC/860.
Adapting a parallel sparse direct solver to SMP architectures
, 2003
"... We consider the direct solution of general sparse linear systems based on a multifrontal method. The approach combines partial static scheduling of the task dependency graph during the symbolic factorization and distributed dynamic scheduling during the numerical factorization to equilibrate work am ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We consider the direct solution of general sparse linear systems based on a multifrontal method. The approach combines partial static scheduling of the task dependency graph during the symbolic factorization and distributed dynamic scheduling during the numerical factorization to equilibrate work among the processes of a distributed memory computer. We show that to address SMP architectures, and more generally nonuniform memory access multiprocessors, our algorithms for both the static and the dynamic scheduling need to be revisited in order to take account of the nonuniform cost of communication. The performance analysis on an IBM SP with 16 processors per SMP node and up to 128 processors shows that we can signi cantly reduce both the amount of internode communication and the solution time.
Evaluating High Level Parallel Programming Support for Irregular Applications in ICC++
 in ICC++,” in Proceedings of the International Scientific Computing in Objectoriented Parallel Environments Conference
"... Objectoriented techniques have been proffered as aids for managing complexity, enhancing reuse, and improving readability of irregular parallel applications. However, as performance is the major reason for employing parallelism, programmability and high performance must be delivered together. Using ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Objectoriented techniques have been proffered as aids for managing complexity, enhancing reuse, and improving readability of irregular parallel applications. However, as performance is the major reason for employing parallelism, programmability and high performance must be delivered together. Using a suite of seven challenging irregular applications and the mature Illinois Concert system (a highlevel concurrent objectoriented programming model backed by an aggressive implementation), we evaluate what programming efforts are required to achieve high performance. For all seven applications, we achieve performance comparable to the best reported for lowlevel programming means on largescale parallel systems. In general, a highlevel concurrent objectoriented programming model supported by aggressive implementation techniques can eliminate programmer management of many concerns  procedure and computation granularity, namespace management, and lowlevel concurrency management. Our st...