Results 1  10
of
79
High Resolution Forward and Inverse Earthquake Modeling on Terascale Computers
 In SC2003
, 2003
"... For earthquake simulations to play an important role in the reduction of seismic risk, they must be capable of high resolution and high fidelity. We have developed algorithms and tools for earthquake simulation based on multiresolution hexahedral meshes. We have used this capability to carry out 1 H ..."
Abstract

Cited by 73 (24 self)
 Add to MetaCart
(Show Context)
For earthquake simulations to play an important role in the reduction of seismic risk, they must be capable of high resolution and high fidelity. We have developed algorithms and tools for earthquake simulation based on multiresolution hexahedral meshes. We have used this capability to carry out 1 Hz simulations of the 1994 Northridge earthquake in the LA Basin using 100 million grid points. Our wave propagation solver sustains 1.21 teraflop/s for 4 hours on 3000 AlphaServer processors at 80% parallel efficiency. Because of uncertainties in characterizing earthquake source and basin material properties, a critical remaining challenge is to invert for source and material parameter fields for complex 3D basins from records of past earthquakes. Towards this end, we present results for material and source inversion of highresolution models of basins undergoing antiplane motion using parallel scalable inversion algorithms that overcome many of the difficulties particular to inverse heterogeneous wave propagation problems.
A fast solver for the Stokes equations with distributed forces in complex geometries
 J. Comput. Phys
"... We present a new method for the solution of the Stokes equations. The main features of our method are: (1) it can be applied to arbitrary geometries in a blackbox fashion; (2) it is second order accurate; and (3) it has optimal algorithmic complexity. Our approach, to which we refer as the Embedded ..."
Abstract

Cited by 38 (10 self)
 Add to MetaCart
(Show Context)
We present a new method for the solution of the Stokes equations. The main features of our method are: (1) it can be applied to arbitrary geometries in a blackbox fashion; (2) it is second order accurate; and (3) it has optimal algorithmic complexity. Our approach, to which we refer as the Embedded Boundary Integral method, is based on Anita Mayo’s work for the Poisson’s equation: “The Fast Solution of Poisson’s and the Biharmonic Equations on Irregular Regions”, SIAM Journal on Numerical Analysis, 21 (1984), pp. 285–299. We embed the domain in a rectangular domain, for which fast solvers are available, and we impose the boundary conditions as interface (jump) conditions on the velocities and tractions. We use an indirect boundary integral formulation for the homogeneous Stokes equations to compute the jumps. The resulting equations are discretized by Nyström’s method. The rectangular domain problem is discretized by finite elements for a velocitypressure formulation with equal order interpolation bilinear elements (£¥ ¤£¥ ¤). Stabilization is used to circumvent the ¦¨§�©������� � condition for the pressure space. For the integral equations, fast matrix vector multiplications are achieved via an ���¨���� � algorithm based on a block representation of the discrete integral operator, combined with (kernel independent) singular value decomposition to sparsify lowrank blocks. The regular grid solver is a Krylov method (Conjugate Residuals) combined with an optimal twolevel Schwartzpreconditioner. For the integral equation we use GMRES. We have tested our algorithm on several numerical examples and we have observed optimal convergence rates. Key Words: Stokes equations, fast solvers, integral equations, doublelayer potential, fast multipole methods, embedded domain methods, immersed interface methods, fictitious
A scalable parallel algorithm for incomplete factor preconditioning
 SIAM Journal on Scientific Computing
"... Abstract. We describe a parallel algorithm for computing incomplete factor (ILU) preconditioners. The algorithm attains a high degree of parallelism through graph partitioning and a twolevel ordering strategy. Both the subdomains and the nodes within each subdomain are ordered to preserve concurren ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a parallel algorithm for computing incomplete factor (ILU) preconditioners. The algorithm attains a high degree of parallelism through graph partitioning and a twolevel ordering strategy. Both the subdomains and the nodes within each subdomain are ordered to preserve concurrency. We show through an algorithmic analysis and through computational results that this algorithm is scalable. Experimental results include timings on three parallel platforms for problems with up to 20 million unknowns running on up to 216 processors. The resulting preconditioned Krylov solvers have the desirable property that the number of iterations required for convergence is insensitive to the number of processors.
A New Parallel KernelIndependent Fast Multipole Method
 in SC2003
"... We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling techn ..."
Abstract

Cited by 34 (12 self)
 Add to MetaCart
(Show Context)
We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPIbased parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixedsize scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.
Numerical time integration for air pollution models
 SURVEYS ON MATHEMATICS FOR INDUSTRY
, 1998
"... ..."
PRIMME: PReconditioned Iterative Multimethod Eigensolver: METHODS AND SOFTWARE DESCRIPTION
, 2006
"... This paper describes the PRIMME software package for the solving large, sparse Hermitian and real symmetric eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. O ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
This paper describes the PRIMME software package for the solving large, sparse Hermitian and real symmetric eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. On the other hand, the complexity of tuning or even using such methods has kept them outside the reach of many users. Responding to this problem, our goal was to develop a general purpose software that requires minimal or no tuning, yet it provides the best possible robustness and efficiency. PRIMME is a comprehensive package that brings stateoftheart methods from “bleeding edge ” to production, with a flexible, yet highly usable interface. We review the theory that gives rise to the near optimal methods GD+k and JDQMR, and present the various algorithms that constitute the basis of PRIMME. We also describe the software implementation, interface, and provide some sample experimental results.
C.: A Robust Framework for Soft Tissue Simulations with Application to Modeling Brain Tumor
 Mass Effect in 3D MR images. Physics in Medicine and Biology 52
, 2007
"... Abstract. We present a framework for blackbox and flexible simulation of soft tissue deformation for medical imaging and surgical planning applications. We use a regular grid approach in which we approximate coefficient discontinuities, distributed forces and boundary conditions. This approach circ ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We present a framework for blackbox and flexible simulation of soft tissue deformation for medical imaging and surgical planning applications. We use a regular grid approach in which we approximate coefficient discontinuities, distributed forces and boundary conditions. This approach circumvents the need for unstructured mesh generation, which is often a bottleneck in the modeling and simulation pipeline. When using discretizations that do not conform to the boundary however, it becomes challenging to impose boundary conditions. Moreover, the resulting linear algebraic systems can require excessive memory storage and solution times. Our framework employs penalty approaches to impose boundary conditions and uses a matrixfree implementation coupled with a multigridaccelerated Krylov solver. The overall scheme results in a scalable method with minimal storage requirements and optimal algorithmic complexity. We also describe an Eulerian formulation to allow for large deformations, with a levelset based approach for evolving fronts. Finally, we illustrate the potential of our framework to simulate realistic brain tumor mass effects at reduced computational cost, for aiding the registration process towards the construction of brain tumor atlases. 1
Balancing NeumannNeumann preconditioners for mixed approximations of heterogeneous problems in linear elasticity
 Numer. Math
"... Abstract. Balancing NeumannNeumann methods are extented to mixed formulations of the linear elasticity system with discontinuous coeÆcients, discretized with mixed nite or spectral elements with discontinuous pressures. These domain decomposition methods implicitly eliminate the degrees of freedom ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Balancing NeumannNeumann methods are extented to mixed formulations of the linear elasticity system with discontinuous coeÆcients, discretized with mixed nite or spectral elements with discontinuous pressures. These domain decomposition methods implicitly eliminate the degrees of freedom associated with the interior of each subdomain and solve iteratively the resulting saddle point Schur complement using a hybrid preconditioner based on a coarse mixed elasticity problem and local mixed elasticity problems with natural and essential boundary conditions. A polylogarithmic bound in the local number of degrees of freedom is proven for the condition number of the preconditioned operator in the constant coeÆcient case. Parallel and serial numerical experiments conrm the theoretical results, indicate that they still hold for systems with discontinuous coeÆcients, and show that our algorithm is scalable, parallel, and robust with respect to material heterogeneities. The results on heterogeneous general problems are also supported in part by our theory.
Parallel computation of high dimensional robust correlation and covariance matrices
 In KDD ’04: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining
, 2004
"... The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practise because of its high computational cost. In this paper, we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6,000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors, and the tradeoffs between accuracy and computational efficiency. We also compare the parallel behaviours of the two methods. From a statistical standpoint, the Maronna method is more robust than QC. From a computational standpoint, while QC requires less computation, interestingly the Maronna method is much more parallelizable than QC. After a thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a large number of processors (e.g., 32). 1
An efficient block variant of GMRES
 SIAM J. Sci. Comput
"... Abstract. We present an alternative to the standard restarted GMRES algorithm for solving a single righthand side linear system Ax = b based on solving the block linear system AX = B. Additional starting vectors and righthand sides are chosen to accelerate convergence. Algorithm performance, i.e. ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present an alternative to the standard restarted GMRES algorithm for solving a single righthand side linear system Ax = b based on solving the block linear system AX = B. Additional starting vectors and righthand sides are chosen to accelerate convergence. Algorithm performance, i.e. time to solution, is improved by using the matrix A in operations on groups of vectors, or “multivectors, ” thereby reducing the movement of A through memory. The efficient implementation of our method depends on a fast matrixmultivector multiply routine. We present numerical results that show that the time to solution of the new method is up to two and half times faster than that of restarted GMRES on preconditioned problems. Furthermore, we demonstrate the impact of implementation choices on data movement and, as a result, algorithm performance. Key words. GMRES, block GMRES, iterative methods, Krylov subspace techniques, restart, nonsymmetric linear systems, memory access costs AMS subject classifications. 65F10