Results 1 - 10
of
14
Pseudo-transient continuation and differential-algebraic equations
- SIAM J. Sci. Comp
, 2003
"... Abstract. Pseudo-transient continuation is a practical technique for globalizing the computation of steady-state solutions of nonlinear differential equations. The technique employs adaptive time-stepping to integrate an initial value problem derived from an underlying ODE or PDE boundary value prob ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Abstract. Pseudo-transient continuation is a practical technique for globalizing the computation of steady-state solutions of nonlinear differential equations. The technique employs adaptive time-stepping to integrate an initial value problem derived from an underlying ODE or PDE boundary value problem until sufficient accuracy in the desired steady-state root is achieved to switch over to Newton’s method and gain a rapid asymptotic convergence. The existing theory for pseudo-transient continuation includes a global convergence result for differential equations written in semidiscretized method-of-lines form. However, many problems are better formulated or can only sensibly be formulated as differentialalgebraic equations (DAEs). These include systems in which some of the equations represent algebraic constraints, perhaps arising from the spatial discretization of a PDE constraint. Multirate systems, in particular, are often formulated as differential-algebraic systems to suppress fast time scales (acoustics, gravity waves, Alfven waves, near equilibrium chemical oscillations, etc.) that are irrelevant on the dynamical time scales of interest. In this paper we present a global convergence result for pseudo-transient continuation applied to DAEs of index 1, and we illustrate it with numerical experiments on model incompressible flow and reacting flow problems, in which a constraint is employed to step over acoustic waves.
Balancing Neumann-Neumann methods for incompressible Stokes equations
- Comm. Pure Appl. Math
, 2001
"... Balancing Neumann-Neumann methods are introduced and studied for incompressible Stokes equations discretized with mixed nite or spectral elements with discontinuous pressures. After decomposing the original domain of the problem into nonoverlapping subdomains, the interior unknowns, which are the in ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Balancing Neumann-Neumann methods are introduced and studied for incompressible Stokes equations discretized with mixed nite or spectral elements with discontinuous pressures. After decomposing the original domain of the problem into nonoverlapping subdomains, the interior unknowns, which are the interior velocity component and all except the constant pressure component, of each subdomain problem are implicitly eliminated. The resulting saddle point Schur complement is solved with a Krylov space method with a balancing Neumann-Neumann preconditioner based on the solution of a coarse Stokes problem with a few degrees of freedom per subdomain and on the solution of local Stokes problems with natural and essential boundary conditions on the subdomains. This preconditioner is of hybrid form in which the coarse problem is treated multiplicatively while the local problems are treated additively. The condition number of the preconditioned operator is independent of the number of subdomains a...
A high-order discontinuous Galerkin multigrid solver for . . .
, 2004
"... Results are presented from the development of a high-order discontinuous Galerkin finite element solver using p-multigrid with line Jacobi smoothing. The line smoothing algorithm is presented for unstructured meshes, and p-multigrid is outlined for the nonlinear Euler equations of gas dynamics. Anal ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Results are presented from the development of a high-order discontinuous Galerkin finite element solver using p-multigrid with line Jacobi smoothing. The line smoothing algorithm is presented for unstructured meshes, and p-multigrid is outlined for the nonlinear Euler equations of gas dynamics. Analysis of 2-D advection shows the improved performance of line implicit versus block implicit relaxation. Through a mesh refinement study, the accuracy of the discretization is determined to be the optimal O(h p+1) for smooth problems in 2-D and 3-D. The multigrid convergence rate is found to be independent of the interpolation order but weakly dependent on the grid size. Timing studies for each problem indicate that higher order is advantageous over grid refinement when high accuracy is required. Finally, parallel versions of the 2-D and 3-D solvers demonstrate close to ideal coarse-grain scalability.
NEWTON-GMRES PRECONDITIONING FOR DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE NAVIER-STOKES EQUATIONS
"... Abstract. We study preconditioners for the iterative solution of the linear systems arising in the implicit time integration of the compressible Navier-Stokes equations. The spatial discretization is carried out using a Discontinuous Galerkin method with fourth order polynomial interpolations on tri ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Abstract. We study preconditioners for the iterative solution of the linear systems arising in the implicit time integration of the compressible Navier-Stokes equations. The spatial discretization is carried out using a Discontinuous Galerkin method with fourth order polynomial interpolations on triangular elements. The time integration is based on backward difference formulas resulting in a nonlinear system of equations which is solved at each timestep. This is accomplished using Newton’s method. The resulting linear systems are solved using a preconditioned GMRES iterative algorithm. We consider several existing preconditioners such as block-Jacobi and Gauss-Seidel combined with multi-level schemes which have been developed and tested for specific applications. While our results are consistent with the claims reported, we find that these preconditioners lack robustness when used in more challenging situations involving low Mach numbers, stretched grids or high Reynolds number turbulent flows. We propose a preconditioner based on a coarse scale correction with post-smoothing based on a block incomplete LU factorization with zero fill-in (ILU0) of the Jacobian matrix. The performance of the ILU0 smoother is found to depend critically on the element numbering. We propose a numbering strategy based on minimizing the discarded fill-in in a greedy fashion. The coarse scale correction scheme is found to be important for diffusion dominated
Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply
- In Proceedings of the International Conference on Parallel Processing
, 2004
"... We present optimizations for sparse matrix-vector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We present optimizations for sparse matrix-vector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We also show performance speedups of 2.1× for SpMV and 2.6 × for SpMM, when compared to the best non-symmetric register blocked implementation. We present an approach for the selection of tuning parameters, based on empirical modeling and search that consists of three steps: (1) Off-line benchmark, (2) Runtime search, and (3) Heuristic performance model. This approach generally selects parameters to achieve performance with 85 % of that achieved with exhaustive search. We evaluate our implementations with respect to upper bounds on performance. Our model bounds performance by considering only the cost of memory operations and using lower bounds on the number of cache misses. Our optimized codes are within 68 % of the upper bounds. 1
Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy
- ACM Trans. Math. Softw
"... By using a combination of 32-bit and 64-bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techni ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
By using a combination of 32-bit and 64-bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as
Using GPUs to improve multigrid solver performance on a cluster
- J. OF COMPUTATIONAL SCIENCE AND ENGINEERING
, 2008
"... This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requirin ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code. Because of their excellent price performance ratio, we demonstrate the viability of our approach by using commodity graphics processors (GPUs) as efficient multigrid preconditioners. We address the issue of limited precision on GPUs by applying a mixed precision, iterative refinement technique. Other restrictions are also handled by a close interplay between the GPU and CPU. From a software perspective, we integrate the GPU solvers into the existing MPI-based Finite Element package by implementing the same interfaces as the CPU solvers, so that for the application programmer they are easily interchangeable. Our results show that we do not compromise any software functionality and gain speedups of two and more for large problems. Equipped with this additional option of hardware acceleration we compare different choices in increasing the performance of a conventional, commodity based cluster by increasing the number
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
"... Abstract. The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we de ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we describe the first steps toward a fully automated system for the optimization of the matrix algebra kernels that are a foundational part of many scientific applications. To generate highly optimized code from a high-level MATLAB prototype, we define a three-step approach. To begin, we have developed a compiler that converts a MAT-LAB script into simple C code. We then use the polyhedral optimization system PLuTo to optimize that code for coarse-grained parallelism and locality simultaneously. Finally, we annotate the resulting code with performance-tuning directives and use the empirical performance-tuning system Orio to generate many tuned versions of the same operation using different optimization techniques, such as loop unrolling and memory alignment. Orio performs an automated empirical search to select the best among the multiple optimized code variants. We discuss performance results on two architectures. Key words: MATLAB, code generation, empirical performance tuning 1
Dual-Level Parallelism for Deterministic and Stochastic CFD Problems
- In Proceedings of the 2002 ACM/IEEE conference on Supercomputing
, 2002
"... A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP an ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively prefinement, which is crucial to adaptive discretization using high-order methods.
Using Automatic Differentiation for Second-Order Matrix-free Methods in PDE-constrained Optimization
, 2000
"... Classical methods of constrained optimization are often based on the assumptions that projection onto the constraint manifold is routine but accessing second-derivative information is not. Both assumptions need revision for the application of optimization to systems constrained by partial differe ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Classical methods of constrained optimization are often based on the assumptions that projection onto the constraint manifold is routine but accessing second-derivative information is not. Both assumptions need revision for the application of optimization to systems constrained by partial differential equations, in the contemporary limit of millions of state variables and in the parallel setting. Large-scale PDE solvers are complex pieces of software that exploit detailed knowledge of architecture and application and cannot easily be modified to fit the interface requirements of a blackbox optimizer. Furthermore, in view of the expense of PDE analyses, optimization methods not using second derivatives may require too many iterations to be practical. For general problems, automatic differentiation is likely to be the most convenient means of exploiting second derivatives. We delineate a role for automatic differentiation in matrix-free optimization formulations involving Newto...

