Results 1  10
of
90
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 116 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
How Good is Recursive Bisection?
 SIAM J. Sci. Comput
, 1995
"... . The most commonly used pway partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Becaus ..."
Abstract

Cited by 86 (4 self)
 Add to MetaCart
. The most commonly used pway partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Because the optimal bisection problem, that partitions a graph into two equal sized subgraphs to minimize the number of edges cut, is NPcomplete, practical RB algorithms use more efficient heuristics in place of an optimal bisection algorithm. Most such heuristics are designed to find the best possible bisection within allowed time. We show that the recursive bisection method, even when an optimal bisection algorithm is assumed, may produce a pway partition that is very far way from the optimal one. Our negative result is complemented by two positive ones: First we show that for some important classes of graphs that occur in practical applications, such as wellshaped finite element and finite difference...
Approximate Inverse Preconditioners Via SparseSparse Iterations
, 1998
"... . The standard incomplete LU (ILU) preconditioners often fail for general sparse indefinite matrices because they give rise to `unstable' factors L and U . In such cases, it may be attractive to approximate the inverse of the matrix directly. This paper focuses on approximate inverse preconditi ..."
Abstract

Cited by 79 (17 self)
 Add to MetaCart
. The standard incomplete LU (ILU) preconditioners often fail for general sparse indefinite matrices because they give rise to `unstable' factors L and U . In such cases, it may be attractive to approximate the inverse of the matrix directly. This paper focuses on approximate inverse preconditioners based on minimizing kI \Gamma AMkF , where AM is the preconditioned matrix. An iterative descenttype method is used to approximate each column of the inverse. For this approach to be efficient, the iteration must be done in sparse mode, i.e., with `sparsematrix by sparsevector' operations. Numerical dropping is applied to maintain sparsity; compared to previous methods, this is a natural way to determine the sparsity pattern of the approximate inverse. This paper describes Newton, `global' and columnoriented algorithms, and discusses options for initial guesses, selfpreconditioning, and dropping strategies. Some limited theoretical results on the properties and convergence of approxima...
Reachability is harder for directed than for undirected finite graphs
 Journal of Symbolic Logic
, 1990
"... Abstract. Although it is known that reachability in undirected finite graphs can be expressed by an existential monadic secondorder sentence, our main result is that this is not the case for directed finite graphs (even in the presence of certain “builtin ” relations, such as the successor relatio ..."
Abstract

Cited by 70 (8 self)
 Add to MetaCart
Abstract. Although it is known that reachability in undirected finite graphs can be expressed by an existential monadic secondorder sentence, our main result is that this is not the case for directed finite graphs (even in the presence of certain “builtin ” relations, such as the successor relation). The proof makes use of EhrenfeuchtFrai’sse games, along with probabilistic arguments. However, we show that for directed finite graphs with degree at most k, reachability is expressible by an existential monadic secondorder sentence. $1. Introduction. If s and t denote distinguished points in a directed (resp. undirected) graph, then we say that a graph is (s, t)connected if there is a directed (undirected) path from s to t. We sometimes refer to the problem of deciding whether a given directed (undirected) graph with two given points sand t is (s, t)connected as the directed (undirected) reachability problem.
Subquadratictime factoring of polynomials over finite fields
 Math. Comp
, 1998
"... Abstract. New probabilistic algorithms are presented for factoring univariate polynomials over finite fields. The algorithms factor a polynomial of degree n over a finite field of constant cardinality in time O(n 1.815). Previous algorithms required time Θ(n 2+o(1)). The new algorithms rely on fast ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
Abstract. New probabilistic algorithms are presented for factoring univariate polynomials over finite fields. The algorithms factor a polynomial of degree n over a finite field of constant cardinality in time O(n 1.815). Previous algorithms required time Θ(n 2+o(1)). The new algorithms rely on fast matrix multiplication techniques. More generally, to factor a polynomial of degree n over the finite field Fq with q elements, the algorithms use O(n 1.815 log q) arithmetic operations in Fq. The new “baby step/giant step ” techniques used in our algorithms also yield new fast practical algorithms at superquadratic asymptotic running time, and subquadratictime methods for manipulating normal bases of finite fields. 1.
A New Polynomial Factorization Algorithm and its Implementation
 Journal of Symbolic Computation
, 1996
"... We consider the problem of factoring univariate polynomials over a finite field. We demonstrate that the new baby step/giant step factoring method, recently developed by Kaltofen & Shoup, can be made into a very practical algorithm. We describe an implementation of this algorithm, and present th ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
We consider the problem of factoring univariate polynomials over a finite field. We demonstrate that the new baby step/giant step factoring method, recently developed by Kaltofen & Shoup, can be made into a very practical algorithm. We describe an implementation of this algorithm, and present the results of empirical tests comparing this new algorithm with others. When factoring polynomials modulo large primes, the algorithm allows much larger polynomials to be factored using a reasonable amount of time and space than was previously possible. For example, this new software has been used to factor a "generic" polynomial of degree 2048 modulo a 2048bit prime in under 12 days on a Sun SPARCstation 10, using 68 MB of main memory. 1 Introduction We consider the problem of factoring a univariate polynomial of degree n over the field F p of p elements, where p is prime. This problem has been wellstudied, and many algorithms for its solution have been proposed. In general, the running tim...
Nearly Optimal Algorithms For Canonical Matrix Forms
, 1993
"... A Las Vegas type probabilistic algorithm is presented for finding the Frobenius canonical form of an n x n matrix T over any field K. The algorithm requires O~(MM(n)) = MM(n) (log n) ^ O(1) operations in K, where O(MM(n)) operations in K are sufficient to multiply two n x n matrices over K. This nea ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
A Las Vegas type probabilistic algorithm is presented for finding the Frobenius canonical form of an n x n matrix T over any field K. The algorithm requires O~(MM(n)) = MM(n) (log n) ^ O(1) operations in K, where O(MM(n)) operations in K are sufficient to multiply two n x n matrices over K. This nearly matches the lower bound of \Omega(MM(n)) operations in K for this problem, and improves on the O(n^4) operations in K required by the previously best known algorithms. We also demonstrate a fast parallel implementation of our algorithm for the Frobenius form, which is processorefficient on a PRAM. As an application we give an algorithm to evaluate a polynomial g(x) in K[x] at T which requires only O~(MM(n)) operations in K when deg g < n^2. Other applications include sequential and parallel algorithms for computing the minimal and characteristic polynomials of a matrix, the rational Jordan form of a matrix, for testing whether two matrices are similar, and for matrix powering, which are substantially faster than those previously known.
OnLine Learning of Linear Functions
 Computational Complexity
, 1991
"... this paper, we present nearoptimal strategies for combining opinions in situations like this. In more abstract terms, we study the online learning of linear functions. We assume that learning proceeds in a sequence of trials. At trial number t the learning algorithm (the advisor) is presented with ..."
Abstract

Cited by 41 (18 self)
 Add to MetaCart
this paper, we present nearoptimal strategies for combining opinions in situations like this. In more abstract terms, we study the online learning of linear functions. We assume that learning proceeds in a sequence of trials. At trial number t the learning algorithm (the advisor) is presented with an instance ~x t 2 [0; 1]
SublinearTime Parallel Algorithms for Matching and Related Problems
, 1988
"... This paper presents the first sublineartime deterministic parallel algorithms for bipartite matching and several related problems, including maximal nodedisjoint paths, depthfirst search, and flows in zeroone networks. Our results are based on a better understanding of the combinatorial struc ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
This paper presents the first sublineartime deterministic parallel algorithms for bipartite matching and several related problems, including maximal nodedisjoint paths, depthfirst search, and flows in zeroone networks. Our results are based on a better understanding of the combinatorial structure of the above problems, which leads to new algorithmic techniques. In particular, we show how to use maximal matching to extend, in parallel, a current set of nodedisjoint paths and how to take advantage of the parallelism that arises when a large number of nodes are "active" during an execution of a pushrelabel network flow algorithm. We also show how to apply our techniques to design parallel algorithms for the weighted versions of the above problems. In particular, we present sublineartime deterministic parallel algorithms for finding a minimumweight bipartite matching and for finding a minimumcost flow in a network with zeroone capacities, if the weights are polynomially ...
Distributed MatrixFree Solution of Large Sparse Linear Systems over Finite Fields
 Algorithmica
, 1996
"... We describe a coarsegrain parallel software system for the homogeneous solution of linear systems. Our solutions are symbolic, i.e., exact rather than numerical approximations. Our implementation can be run on a network cluster of SPARC20 computers and on an SP2 multiprocessor. Detailed timings a ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
We describe a coarsegrain parallel software system for the homogeneous solution of linear systems. Our solutions are symbolic, i.e., exact rather than numerical approximations. Our implementation can be run on a network cluster of SPARC20 computers and on an SP2 multiprocessor. Detailed timings are presented for experiments with systems that arise in RSA challenge integer factoring efforts. For example, we can solve a 252; 222 \Theta 252; 222 system with about 11.04 million nonzero entries over the Galois field with 2 elements using 4 processors of an SP2 multiprocessor, in about 26.5 hours CPU time. 1 Introduction The problem of solving large, unstructured, sparse linear systems using exact arithmetic arises in symbolic linear algebra and computational number theory. For example the sievebased factoring of large integers can lead to systems containing over 569,000 equations and variables and over 26.5 million nonzero entries, that need to be solved over the Galois field of two...