Results 1  10
of
52
High Resolution Forward and Inverse Earthquake Modeling on Terascale Computers
 In SC2003
, 2003
"... For earthquake simulations to play an important role in the reduction of seismic risk, they must be capable of high resolution and high fidelity. We have developed algorithms and tools for earthquake simulation based on multiresolution hexahedral meshes. We have used this capability to carry out 1 H ..."
Abstract

Cited by 39 (17 self)
 Add to MetaCart
For earthquake simulations to play an important role in the reduction of seismic risk, they must be capable of high resolution and high fidelity. We have developed algorithms and tools for earthquake simulation based on multiresolution hexahedral meshes. We have used this capability to carry out 1 Hz simulations of the 1994 Northridge earthquake in the LA Basin using 100 million grid points. Our wave propagation solver sustains 1.21 teraflop/s for 4 hours on 3000 AlphaServer processors at 80% parallel efficiency. Because of uncertainties in characterizing earthquake source and basin material properties, a critical remaining challenge is to invert for source and material parameter fields for complex 3D basins from records of past earthquakes. Towards this end, we present results for material and source inversion of highresolution models of basins undergoing antiplane motion using parallel scalable inversion algorithms that overcome many of the difficulties particular to inverse heterogeneous wave propagation problems.
A scalable parallel algorithm for incomplete factor preconditioning
 SIAM Journal on Scientific Computing
"... Abstract. We describe a parallel algorithm for computing incomplete factor (ILU) preconditioners. The algorithm attains a high degree of parallelism through graph partitioning and a twolevel ordering strategy. Both the subdomains and the nodes within each subdomain are ordered to preserve concurren ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Abstract. We describe a parallel algorithm for computing incomplete factor (ILU) preconditioners. The algorithm attains a high degree of parallelism through graph partitioning and a twolevel ordering strategy. Both the subdomains and the nodes within each subdomain are ordered to preserve concurrency. We show through an algorithmic analysis and through computational results that this algorithm is scalable. Experimental results include timings on three parallel platforms for problems with up to 20 million unknowns running on up to 216 processors. The resulting preconditioned Krylov solvers have the desirable property that the number of iterations required for convergence is insensitive to the number of processors.
A fast solver for the Stokes equations with distributed forces in complex geometries
 J. Comput. Phys
"... We present a new method for the solution of the Stokes equations. The main features of our method are: (1) it can be applied to arbitrary geometries in a blackbox fashion; (2) it is second order accurate; and (3) it has optimal algorithmic complexity. Our approach, to which we refer as the Embedded ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
We present a new method for the solution of the Stokes equations. The main features of our method are: (1) it can be applied to arbitrary geometries in a blackbox fashion; (2) it is second order accurate; and (3) it has optimal algorithmic complexity. Our approach, to which we refer as the Embedded Boundary Integral method, is based on Anita Mayo’s work for the Poisson’s equation: “The Fast Solution of Poisson’s and the Biharmonic Equations on Irregular Regions”, SIAM Journal on Numerical Analysis, 21 (1984), pp. 285–299. We embed the domain in a rectangular domain, for which fast solvers are available, and we impose the boundary conditions as interface (jump) conditions on the velocities and tractions. We use an indirect boundary integral formulation for the homogeneous Stokes equations to compute the jumps. The resulting equations are discretized by Nyström’s method. The rectangular domain problem is discretized by finite elements for a velocitypressure formulation with equal order interpolation bilinear elements (£¥ ¤£¥ ¤). Stabilization is used to circumvent the ¦¨§�©������� � condition for the pressure space. For the integral equations, fast matrix vector multiplications are achieved via an ���¨���� � algorithm based on a block representation of the discrete integral operator, combined with (kernel independent) singular value decomposition to sparsify lowrank blocks. The regular grid solver is a Krylov method (Conjugate Residuals) combined with an optimal twolevel Schwartzpreconditioner. For the integral equation we use GMRES. We have tested our algorithm on several numerical examples and we have observed optimal convergence rates. Key Words: Stokes equations, fast solvers, integral equations, doublelayer potential, fast multipole methods, embedded domain methods, immersed interface methods, fictitious
A New Parallel KernelIndependent Fast Multipole Method
 in SC2003
"... We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling techn ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPIbased parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixedsize scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.
Numerical time integration for air pollution models
 SURVEYS ON MATHEMATICS FOR INDUSTRY
, 1998
"... ..."
PRIMME: PReconditioned Iterative Multimethod Eigensolver. http://www.cs.wm.edu/∼andreas/software
, 2006
"... This paper describes the PRIMME software package for solving large, sparse Hermitian standard eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. However, the com ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This paper describes the PRIMME software package for solving large, sparse Hermitian standard eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. However, the complexity of tuning or even using such methods has kept them outside the reach of many users. Responding to this problem we have developed PRIMME, a comprehensive package that brings stateoftheart methods from “bleeding edge ” to production, with the best possible robustness, efficiency, and a flexible, yet highly usable interface that requires minimal or no tuning. We describe: (1) the PRIMME multimethod framework that implements a variety of algorithms, including the near optimal methods GD+k and JDQMR; (2) a host of algorithmic innovations and implementation techniques that endow the software with its robustness and efficiency; (3) a multilayer interface that captures our experience and addresses the needs of both expert and end users.
Parallel Computing on Semidefinite Programs
 In Proceedings of the 5th European Workshop on Natural Language Generation
, 2003
"... This paper demonstrates how interiorpoint methods can use multiple processors efficiently to solve large semidefinite programs that arise in VLSI design, control theory, and graph coloring. Previous implementations of these methods have been restricted to a single processor. By computing and solvin ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
This paper demonstrates how interiorpoint methods can use multiple processors efficiently to solve large semidefinite programs that arise in VLSI design, control theory, and graph coloring. Previous implementations of these methods have been restricted to a single processor. By computing and solving the Schur complement matrix in parallel, multiple processors enable the faster solution of medium and large problems. The dualscaling algorithm for semidefinite programming was adapted to a distributed memory environment and used to solve medium and large problems than faster than could previously be solved by interiorpoint algorithms. Three criteria that influence the parallel scalability of the solver are identified. Numerical results show that on problems of appropriate size and structure, the implementation of an interiorpoint method exhibits good scalability on parallel architectures.
Parallel computation of high dimensional robust correlation and covariance matrices
 In KDD ’04: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining
, 2004
"... The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practise because of its high computational cost. In this paper, we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6,000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors, and the tradeoffs between accuracy and computational efficiency. We also compare the parallel behaviours of the two methods. From a statistical standpoint, the Maronna method is more robust than QC. From a computational standpoint, while QC requires less computation, interestingly the Maronna method is much more parallelizable than QC. After a thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a large number of processors (e.g., 32). 1
Balancing NeumannNeumann Preconditioners for the Mixed Formulation of AlmostIncompressible Linear Elasticity
, 2003
"... Balancing NeumannNeumann methods are extended to the equations arising from the mixed formulation of almostincompressible linear elasticity problems discretized with discontinuouspressure finite elements. This family of domain decomposition algorithms has previously been shown to be e#ective for ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Balancing NeumannNeumann methods are extended to the equations arising from the mixed formulation of almostincompressible linear elasticity problems discretized with discontinuouspressure finite elements. This family of domain decomposition algorithms has previously been shown to be e#ective for large finite element approximations of positive definite elliptic problems. Our methods are proved to be scalable and to depend weakly on the size of the local problems. Our work is an extension of previous work by Pavarino and Widlund on BNN methods for Stokes equation.
Inexactness issues in the LagrangeNewtonKrylovSchur method for PDEconstrained optimization
 LargeScale PDEConstrained Optimization, number 30 in Lecture
"... Abstract. In this article we present an outline of the LagrangeNewtonKrylovSchur (LNKS) method and we discuss how we can improve its work efficiency by carrying out certain computations inexactly, without compromising convergence. LNKS has been designed for PDEconstrained optimization problems. I ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract. In this article we present an outline of the LagrangeNewtonKrylovSchur (LNKS) method and we discuss how we can improve its work efficiency by carrying out certain computations inexactly, without compromising convergence. LNKS has been designed for PDEconstrained optimization problems. It solves the KarushKuhnTucker optimality conditions by a NewtonKrylov algorithm. Its key component is a preconditioner based on quasiNewton reduced space Sequential Quadratic Programming (QNRSQP) variants. LNKS combines the fastconvergence properties of a Newton method with the capability of preconditioned Krylov methods to solve very large linear systems. Nevertheless, even with good preconditioners, the solution of an optimization problem has a cost which is several times higher than the cost of the solution of the underlying PDE problem. To accelerate LNKS, its computational components are carried out inexactly: premature termination of iterative algorithms, inexact evaluation of gradients and Jacobians, approximate line searches. Naturally, several issues arise with respect to the tradeoffs between speed and robustness. 1