Results 1  10
of
14
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 116 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
Square Root SAM: Simultaneous localization and mapping via square root information smoothing
 International Journal of Robotics Reasearch
, 2006
"... Solving the SLAM problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. We investigate smoothing approaches as a viable alternative to extended Kalman filterbased solutions to the problem. In particular, we look at approaches that factorize either th ..."
Abstract

Cited by 81 (25 self)
 Add to MetaCart
Solving the SLAM problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. We investigate smoothing approaches as a viable alternative to extended Kalman filterbased solutions to the problem. In particular, we look at approaches that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact, they can be used in either batch or incremental mode, are better equipped to deal with nonlinear process and measurement models, and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. In this paper we present the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. We present both simulation results and actual SLAM experiments in largescale environments that underscore the potential of these methods as an alternative to EKFbased approaches. 1
A multifrontal QR factorization approach to distributed inference applied to multirobot localization and mapping
 in Proceedings of the American Association for Artificial Intelligence
, 2005
"... QR factorization is most often used as a “black box ” algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously c ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
QR factorization is most often used as a “black box ” algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously choosing the order in which variables are eliminated in the clique tree computation, we show that one straightforwardly obtains a method for performing inference in distributed sensor networks. One obvious application is distributed localization and mapping with a team of robots. We phrase the problem as inference on a largescale Gaussian Markov Random Field induced by the measurement factor graph, and show how multifrontal QR on this graph solves for the global map and all the robot poses in a distributed fashion. The method is illustrated using both small and largescale simulations, and validated in practice through actual robot experiments.
iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree
"... We present a novel data structure, the Bayes tree, that provides an algorithmic foundation enabling a better understanding of existing graphical model inference algorithms and their connection to sparse matrix factorization methods. Similar to a clique tree, a Bayes tree encodes a factored probabili ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
We present a novel data structure, the Bayes tree, that provides an algorithmic foundation enabling a better understanding of existing graphical model inference algorithms and their connection to sparse matrix factorization methods. Similar to a clique tree, a Bayes tree encodes a factored probability density, but unlike the clique tree it is directed and maps more naturally to the square root information matrix of the simultaneous localization and mapping (SLAM) problem. In this paper, we highlight three insights provided by our new data structure. First, the Bayes tree provides a better understanding of the matrix factorization in terms of probability densities. Second, we show how the fairly abstract updates to a matrix factorization translate to a simple editing of the Bayes tree and its conditional densities. Third, we apply the Bayes tree to obtain a completely novel algorithm for sparse nonlinear incremental optimization, named iSAM2, which achieves improvements in efficiency through incremental variable reordering and fluid relinearization, eliminating the need for periodic batch steps. We analyze various properties of iSAM2 in detail, and show on a range of real and simulated datasets that our algorithm compares favorably with other recent mapping algorithms in both quality and efficiency.
A Parallel Formulation of Interior Point Algorithms
 DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF MINNESOTA
, 1994
"... In recent years, interior point algorithms have been used successfully for solving medium to largesize linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a s ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
In recent years, interior point algorithms have been used successfully for solving medium to largesize linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a sparse system of linear equations using Cholesky factorization. The performance of parallel Cholesky factorization is determined by (a) the communication overhead incurred by the algorithm, and (b) the load imbalance among the processors. In our parallel interior point algorithm, we use our recently developed parallel multifrontal algorithm that has the smallest communication overhead over all parallel algorithms for Cholesky factorization developed to date. The computation imbalance depends on the shape of the elimination tree associated with the sparse system reordered for factorization. To balance the computation, we implemented and evaluated four di#erent ordering algorithms. Among these algorithms, KernighanLin and spectral nested dissection yield the most balanced elimination trees and greatly increase the amount of parallelism that can be exploited. Our preliminary implementation achieves a speedup as high as 108 on 256processor nCUBE 2 on moderatesize problems.
A high performance sparse Cholesky factorization algorithm for scalable parallel computers
 Department of Computer Science, University of Minnesota
, 1994
"... Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtreetosubcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 6GFlops on a 256processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrixvector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
A Scalable Parallel Algorithm for Sparse Cholesky Factorization
 In SuperComputing '94
"... In this paper, we describe a scalable parallel algorithm for sparse Cholesky factorization, analyze its performance and scalability, and present experimental results of its implementation on a 1024processor nCUBE2 parallel computer. Through our analysis and experimental results, we demonstrate that ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse Cholesky factorization, analyze its performance and scalability, and present experimental results of its implementation on a 1024processor nCUBE2 parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm improves the state of the art in parallel direct solution of sparse linear systems by an order of magnitudeboth in terms of speedups and the number of processors that can be utilized effectively for a given problem size. This algorithm incurs strictly less communication overhead and is more scalable than any known parallel formulation of sparse matrix factorization. We show that our algorithm is optimally scalable on hypercube and mesh architectures and that its asymptotic scalability is the same as that of dense matrix factorization for a wide class of sparse linear systems, including those arising in all two and three dimensional finite element problems. 1 Introduction ...
WSMP: A HighPerformance Shared and DistributedMemory Parallel Sparse Linear Equation Solver
, 2001
"... The Watson Sparse Matrix Package, WSMP, is a highperformance, robust, and easy to use software package for solving large sparse systems of linear equations. It can be used as a serial package, or in a sharedmemory multiprocessor environment, or as a scalable parallel solver in a messagepassing en ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The Watson Sparse Matrix Package, WSMP, is a highperformance, robust, and easy to use software package for solving large sparse systems of linear equations. It can be used as a serial package, or in a sharedmemory multiprocessor environment, or as a scalable parallel solver in a messagepassing environment, where each node can either be a uniprocessor or a sharedmemory multiprocessor. A unique aspect of WSMP is that it exploits both SMP and MPP parallelism using Pthreads and MPI, respectively, while mostly shielding the user from the details of the architecture. Sparse symmetric factorization in WSMP has been clocked at up to 1.2 Gigaflops on RS6000 workstations with two 200 MHz Power3 CPUs and in excess of 90 Gigaflops on 128node (256processor) SP with twoway SMP 200 MHz Power3 nodes. This paper gives an overview of the algorithms, implementation aspects, performance results, and the user interface of WSMP for solving symmetric sparse systems of linear equations. Key words. Parallel software, Scientific computing, Sparse linear systems, Sparse matrix factorization, Highperformance computing 1.
Chapter 1 Performance evaluation of the parallel multifrontal method in a distributed memory environment
"... We study, using analytic models and simulation, the performance of the multifrontal methods on distributed memory architectures. We focus on a particular strategy for partitioning, clustering, and mapping of task nodes to processors in order to minimize the overall parallel execution time and minimi ..."
Abstract
 Add to MetaCart
We study, using analytic models and simulation, the performance of the multifrontal methods on distributed memory architectures. We focus on a particular strategy for partitioning, clustering, and mapping of task nodes to processors in order to minimize the overall parallel execution time and minimize communication costs. The performance model has been used to obtain estimates for the speedups of various engineering and scientific problems, on several distributed architectures. 1 Problem Statement There have been various efforts directed at solving large sparse systems using direct solvers on distributed memory architectures (see [3] for a survey). One of the difficulties involved in the distributed implementation of some direct solvers, such as the multifrontal method [2], is that the irregular sparse structure of the matrices makes it difficult to partition and map the sparse matrix to a distributed architecture in a way that minimizes communication costs and minimizes the total exe...