Results 1  10
of
21
Numerical solution of saddle point problems
 ACTA NUMERICA
, 2005
"... Large linear systems of saddle point type arise in a wide variety of applications throughout computational science and engineering. Due to their indefiniteness and often poor spectral properties, such linear systems represent a significant challenge for solver developers. In recent years there has b ..."
Abstract

Cited by 179 (29 self)
 Add to MetaCart
Large linear systems of saddle point type arise in a wide variety of applications throughout computational science and engineering. Due to their indefiniteness and often poor spectral properties, such linear systems represent a significant challenge for solver developers. In recent years there has been a surge of interest in saddle point problems, and numerous solution techniques have been proposed for solving this type of systems. The aim of this paper is to present and discuss a large selection of solution methods for linear systems in saddle point form, with an emphasis on iterative methods for large and sparse problems.
A column preordering strategy for the unsymmetricpattern multifrontal method
 ACM Transactions on Mathematical Software
, 2004
"... A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factoriza ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factorization (while preserving the bound). Pivot rows are selected to maintain numerical stability and to preserve sparsity. The method analyzes the matrix and automatically selects one of three preordering and pivoting strategies. The number of nonzeros in the LU factors computed by the method is typically less than or equal to those found by a wide range of unsymmetric sparse LU factorization methods, including leftlooking methods and prior multifrontal methods.
A multifrontal QR factorization approach to distributed inference applied to multirobot localization and mapping
 in Proceedings of the American Association for Artificial Intelligence
, 2005
"... QR factorization is most often used as a “black box ” algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously c ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
QR factorization is most often used as a “black box ” algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously choosing the order in which variables are eliminated in the clique tree computation, we show that one straightforwardly obtains a method for performing inference in distributed sensor networks. One obvious application is distributed localization and mapping with a team of robots. We phrase the problem as inference on a largescale Gaussian Markov Random Field induced by the measurement factor graph, and show how multifrontal QR on this graph solves for the global map and all the robot poses in a distributed fashion. The method is illustrated using both small and largescale simulations, and validated in practice through actual robot experiments.
Analysis and comparison of two general sparse solvers for distributed memory computers
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 2001
"... This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algo ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algorithmic features of the two solvers and compare their performance characteristics with respect to uniprocessor speed, interprocessor communication, and memory requirements. For both solvers, preorderings for numerical stability and sparsity play an important role in achieving high parallel efficiency. We analyse the results with various ordering algorithms. Our performance analysis is based on data obtained from runs on a 512processor Cray T3E using a set of matrices from real applications. We also use regular 3D grid problems to study the scalability of the two solvers.
Sparse Numerical Linear Algebra: Direct Methods and Preconditioning
, 1996
"... Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrixmatrix kernels) can be us ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrixmatrix kernels) can be used. Both sparse LU and QR factorizations can be implemented within this framework. Partitioning and ordering techniques have seen major activity in recent years. We discuss bisection and multisection techniques, extensions to orderings to block triangular form, and recent improvements and modifications to standard orderings such as minimum degree. We also study advances in the solution of indefinite systems and sparse leastsquares problems. The desire to exploit parallelism has been responsible for many of the developments in direct methods for sparse matrices over the last ten years. We examine this aspect in some detail, illustrating how current techniques have been developed or ...
Multifrontal multithreaded rankrevealing sparse QR factorization
"... SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fillreducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rankdetection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.
A Blocked Implementation of Level 3 BLAS for RISC Processors
, 1996
"... We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they describe a similar approach ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they describe a similar approach for efficient serial and parallel implementations of Level 3 BLAS on shared and virtual shared memory multiprocessors. All our codes are written in Fortran and use loopunrolling, blocking, and copying to improve the performance. A blocking technique is used to express the BLAS in terms of operations involving triangular blocks and calls to the matrixmatrix multiplication kernel (GEMM). No manufacturersupplied or assembler code is used. This blocked implementation uses the same blocking ideas as in Dayd'e et al. (1994) except that the ordering of loops is designed for efficient reuse of data held in cache and not necessarily for parallelization. A parameter which controls the bloc...
Developments and Trends in the Parallel Solution of Linear Systems
 Parallel Computing
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field. Keywords: linear systems, dense matrices, sparse matrices, tridiagonal systems, parallelism, direct methods, iterative methods, Krylov methods, preconditioning. AMS(MOS) subject classifications: 65F05, 65F50. 1 Introduction Solution methods for systems of linear equations Ax = b; (1) where A is a coefficient matrix of order n and x and b are nvectors, are usually grouped into two distinct classes: direct methods and iterative methods. However, CCLRC  Rutherford Appleton Laboratory, Oxfordshire, England and CERFACS, Toulouse,...
The impact of high performance Computing in the solution of linear systems: trends and problems
, 1999
"... We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in thi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in this area and speculate on what advances we might expect in the early years of the next century. Keywords: sparse matrices, direct methods, parallelism, matrix factorization, multifrontal methods. AMS(MOS) subject classifications: 65F05, 65F50. 1 Current reports available at http://www.cerfacs.fr/algor/algo reports.html. Also appeared as Technical Report RALTR1999072 from Rutherford Appleton Laboratory, Oxfordshire. 2 duff@cerfacs.fr. Also at Atlas Centre, RAL, Oxon OX11 0QX, England. Rutherford Appleton Laboratory. Contents 1 Introduction 1 2 Building blocks 1 3 Factorization of dense matrices 2 4 Factorization of sparse matrices 4 5 Parallel computation 8 6 Current situation 12 7 F...
Analysis, Tuning and Comparison of Two General Sparse Solvers for Distributed Memory Computers
, 2000
"... We describe the work performed in the context of a FrancoBerkeley funded project between NERSCLBNL located in Berkeley (USA) and CERFACSENSEEIHT located in Toulouse (France). We discuss both the tuning and performance analysis of two distributed memory sparse solvers (SuperLU from Berkeley and MU ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We describe the work performed in the context of a FrancoBerkeley funded project between NERSCLBNL located in Berkeley (USA) and CERFACSENSEEIHT located in Toulouse (France). We discuss both the tuning and performance analysis of two distributed memory sparse solvers (SuperLU from Berkeley and MUMPS from Toulouse) on the 512 processor Cray T3E from NERSC (Lawrence Berkeley National Laboratory). This project gave us the opportunity to improve the algorithms and add new features to the codes. We then quite extensively analyse and compare the two approaches on a set of large problems from real applications. We further explain the main differences in the behaviour of the approaches on artificial regular grid problems. As a conclusion to this activity report, we mention a set of parallel sparse solvers on which this type of study should be extended. Keywords: sparse linear systems, distributed memory codes, multifrontal, supernodal, direct methods, comparison of codes. AMS(MOS) subject classifications: 65F05, 65F50. 1 Current reports available at http://www.cerfacs.fr/algor/algo reports.html. The project was supported by the FranceBerkeley Fund. This project also utilized resources of the National Energy Research Scientific Computing Center (NERSC) under contract number DEAC0376SF00098. 2 amestoy@enseeiht.fr. ENSEEIHTIRIT, 2 rue Camichel, 31071 Toulouse, France. Much of the work done while a visitor at NERSC. 3 duff@cerfacs.fr. Also at Atlas Centre, RAL, Oxon OX11 0QX, England. 4 jeanyves@nag.co.uk. NAg Ltd, Wilkinson House, Oxford OX2 8DR, England. 5 xiaoye@nersc.gov. NERSC, Lawrence Berkeley National Lab, MS 50F, 1 Cyclotron Rd., Berkeley, CA 94720. The research of this author was supported in part by the National Science Foundation Cooperative Agreement...