Results 1 - 10
of
526
Preconditioning techniques for large linear systems: A survey
- J. COMPUT. PHYS
, 2002
"... This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization i ..."
Abstract
-
Cited by 189 (5 self)
- Add to MetaCart
(Show Context)
This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization issues, and block and multilevel extensions. Some of the challenges ahead are also discussed. An extensive bibliography completes the paper.
Scan Primitives for GPU Computing
- GRAPHICS HARDWARE 2007
, 2007
"... The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Us ..."
Abstract
-
Cited by 170 (9 self)
- Add to MetaCart
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
Weighted graph cuts without eigenvectors: A multilevel approach
- IEEE Trans. Pattern Anal. Mach. Intell
, 2007
"... Abstract—A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different method ..."
Abstract
-
Cited by 165 (22 self)
- Add to MetaCart
(Show Context)
Abstract—A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods—in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods such as Metis have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis, and gene network analysis. Index Terms—Clustering, data mining, segmentation, kernel k-means, spectral clustering, graph partitioning. 1
SuperLU DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
- ACM Trans. Mathematical Software
, 2003
"... We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and sc ..."
Abstract
-
Cited by 144 (18 self)
- Add to MetaCart
(Show Context)
We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on large-scale distributed machines.
Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate
- ACM Transactions on Mathematical Software
, 2008
"... CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for b ..."
Abstract
-
Cited by 109 (8 self)
- Add to MetaCart
CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x=A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.
The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
- SIAM J. MATRIX ANAL. APPL
, 1999
"... ..."
A Two-Dimensional Data Distribution Method For Parallel Sparse Matrix-Vector Multiplication
- SIAM REVIEW
"... A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract
-
Cited by 83 (9 self)
- Add to MetaCart
(Show Context)
A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to one-dimensional methods, and in general a good balance in the communication work.
Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
- IEEE Trans. on Parallel and Distributed Computing
"... In this work, we show that the standard graph-partitioning based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph mo ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
In this work, we show that the standard graph-partitioning based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental verification of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices confirm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63% less communication volume (30%--38% less on the average) than the graph model using MeTiS, while PaToH is only 1.3--2.3 times slower than MeTiS on the average. ...
Optimizing the performance of sparse matrix-vector multiplication
, 2000
"... Copyright 2000 by Eun-Jin Im ..."
(Show Context)
Preconditioning highly indefinite and nonsymmetric matrices
- SIAM J. SCI. COMPUT
, 2000
"... Standard preconditioners, like incomplete factorizations, perform well when the coefficient matrix is diagonally dominant, but often fail on general sparse matrices. We experiment with nonsymmetric permutationsand scalingsaimed at placing large entrieson the diagonal in the context of preconditionin ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Standard preconditioners, like incomplete factorizations, perform well when the coefficient matrix is diagonally dominant, but often fail on general sparse matrices. We experiment with nonsymmetric permutationsand scalingsaimed at placing large entrieson the diagonal in the context of preconditioning for general sparse matrices. The permutations and scalings are those developed by Olschowka and Neumaier [Linear Algebra Appl., 240 (1996), pp. 131–151] and by Duff and