Results 1  10
of
27
Nonlinearly preconditioned inexact Newton algorithms
 SIAM J. Sci. Comput
, 2000
"... Abstract. Inexact Newton algorithms are commonlyused for solving large sparse nonlinear system of equations F (u ∗ ) = 0 arising, for example, from the discretization of partial differential equations. Even with global strategies such as linesearch or trust region, the methods often stagnate at loc ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
Abstract. Inexact Newton algorithms are commonlyused for solving large sparse nonlinear system of equations F (u ∗ ) = 0 arising, for example, from the discretization of partial differential equations. Even with global strategies such as linesearch or trust region, the methods often stagnate at local minima of �F �, especiallyfor problems with unbalanced nonlinearities, because the methods do not have builtin machineryto deal with the unbalanced nonlinearities. To find the same solution u ∗ , one maywant to solve instead an equivalent nonlinearlypreconditioned system F(u ∗ ) = 0 whose nonlinearities are more balanced. In this paper, we propose and studya nonlinear additive Schwarzbased parallel nonlinear preconditioner and show numericallythat the new method converges well even for some difficult problems, such as high Reynolds number flows, where a traditional inexact Newton method fails. Key words. nonlinear preconditioning, inexact Newton methods, Krylov subspace methods, nonlinear additive Schwarz, domain decomposition, nonlinear equations, parallel computing, incompressible
How Scalable is Domain Decomposition in Practice?
"... The convergence rates and, therefore, the overall parallel efficiencies of additive Schwarz methods are often dependent on subdomain granularity. Except when ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
The convergence rates and, therefore, the overall parallel efficiencies of additive Schwarz methods are often dependent on subdomain granularity. Except when
Integrating Parallel File I/O and Database Support for HighPerformance Scientific Data Management
 In Proc. of SC2000: High Performance Networking and Computing
, 2000
"... Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that aims to combine the good features of both file I/O and databases. SDM provides a highlevel API to the user and, internally, uses a parallel file system to store real data and a database to store applicationrelated metadata. SDM takes advantage of various I/O optimizations available in MPIIO, such as collective I/O and noncontiguous requests, in a manner that is transparent to the user. As a result, users can write and retrieve data with the performance of parallel file I/O, without having to bother with the details of actually performing file I/O. In this paper, we describe the design and implementation of SDM. With the help of two parallel application templates, ASTRO3D and an Euler solver, we illustrate how some of the design criteria affect performance. 0780398025/2000/$10.00 c 2000 IEEE
GPCG: A Case Study in the Performance and Scalability of Optimization Algorithms
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1999
"... GPCG is an algorithm within the Toolkit for Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems. Originally developed by Mor'e and Toraldo [19], this algorithm was designed for largescale problems but had been implemented only for a single processor. The TAO impl ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
GPCG is an algorithm within the Toolkit for Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems. Originally developed by Mor'e and Toraldo [19], this algorithm was designed for largescale problems but had been implemented only for a single processor. The TAO implementation is available for a wide range of highperformance architecture, and has been tested on up to 64 processors to solve problems with over 2.5 million variables.
Performance modeling and tuning of an unstructured mesh CFD application
 IN PROCEEDINGS OF SC2000
, 2000
"... This paper describes performance tuning experiences with a threedimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several largescale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
This paper describes performance tuning experiences with a threedimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several largescale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and Beowulf clusters. The code achieves a respectable level of performance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memorycentric perspective (in contrast to traditional floporientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters, and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrixvector product operation.
High Performance Parallel Implicit CFD
 Parallel Computing
, 2000
"... Fluid dynamical simulations based on #nite discretizations on #quasi#static grids scale well in parallel, but execute at a disappointing percentage of perprocessor peak #oating point operation rates without special attention to layout and access ordering of data. We document both claims from our e ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Fluid dynamical simulations based on #nite discretizations on #quasi#static grids scale well in parallel, but execute at a disappointing percentage of perprocessor peak #oating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDEbased codes can be understood with surprisingly simple models, for whichwe quote earlier work, presenting primarily experimental results herein. These performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prizewinning simulation on ASCI computers. Key words: parallel implicit solvers, unstructured grids, computational #uid dynamics, highperformance computing 1991 MSC: 65H20, 65N5...
Highly parallel structured adaptive mesh refinement using parallel languagebased approaches
, 2001
"... ..."
Reducing power with performance constraints for parallel sparse applications
 In Proceedings of IPDPS 2005, the 19th IEEE International Parallel and Distributed Processing Symposium, page 8 pp., Apr. 2005. inria00584944, version 1  11 Apr 2011
"... Sparse and irregular computations constitute a large fraction of applications in the dataintensive scientific domain. While every effort is made to balance the computational workload in such computations across parallel processors, achieving sustained near machinepeak performance with closetoide ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Sparse and irregular computations constitute a large fraction of applications in the dataintensive scientific domain. While every effort is made to balance the computational workload in such computations across parallel processors, achieving sustained near machinepeak performance with closetoideal load balanced computationtoprocessor mapping is inherently difficult. As a result, most of the time, the loads assigned to parallel processors can exhibit significant variations. While there have been numerous past efforts that study this imbalance from the performance viewpoint, to our knowledge, no prior study has considered exploiting the imbalance for reducing power consumption during execution. Power consumption in largescale clusters of workstations is becoming a critical issue as noted by several recent research papers from both industry and academia. Focusing on sparse matrix computations in which underlying parallel computations and data dependencies can be represented by trees, this paper proposes and evaluates different schemes that save power through voltage/frequency scaling. Our goal is to reduce overall energy consumption by scaling the voltages/frequencies of those processors that are not in the critical path; i.e., our approach is oriented towards saving power without incurring performance penalties. The experiments with matrices extracted from real applications as well as with model matrices indicate that the proposed strategies are very effective in saving power, and the savings achieved come close to the optimal limits. Our results also show that the proposed approach can also be used to study powerperformance tradeoffs in environments where certain performance degradation is tolerable. 1
A Microkernel Design for Componentbased Parallel Numerical Software Systems
 in: Proceedings of the SIAM Workshop on Object Oriented Methods for Interoperable Scientific and Engineering Computing, SIAM
, 1998
"... What is the minimal software infrastructure and what type of conventions are needed to simplify development of sophisticated parallel numerical application codes using a variety of software components that are not necessarily available as source code? We propose an opaque objectbased model where th ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
What is the minimal software infrastructure and what type of conventions are needed to simplify development of sophisticated parallel numerical application codes using a variety of software components that are not necessarily available as source code? We propose an opaque objectbased model where the objects are dynamically loadable from the file system or network. The microkernel required to manage such a system needs to include, at most ffl a few basic services, namely,  a mechanism for loading objects at run time via dynamic link libraries, and  consistent schemes for error handling and memory management; and ffl selected methods that all objects share, to deal with  object life (destruction, reference counting, relationships), and  object observation (viewing, profiling, tracing). We are experimenting with these ideas in the context of extensible numerical software within the ALICE (Advanced Largescale Integrated Computational Environment) project, where we are build...
Parallel Simulation of Compressible Flow Using Automatic Differentiation and PETSc
"... Many aerospace applications require parallel implicit solution strategies and software. We consider the use of two computational tools, PETSc and ADIFOR, to implement a NewtonKrylovSchwarz method with pseudotransient continuation for a particular application, namely, a steadystate, fully implici ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Many aerospace applications require parallel implicit solution strategies and software. We consider the use of two computational tools, PETSc and ADIFOR, to implement a NewtonKrylovSchwarz method with pseudotransient continuation for a particular application, namely, a steadystate, fully implicit, threedimensional compressible Euler model of flow over an M6 wing. We describe how automatic differentiation (AD) can be used within the PETSc framework to compute the required derivatives. We present performance data demonstrating the suitability of AD and PETSc for this problem. We conclude with a synopsis of our results and a description of opportunities for future work. Key words: Compressible Euler, PETSc, Nonlinear PDEs, Automatic Differentiation 1 Introduction Parallel implicit solution strategies are important in aerodynamic applications modeled by PDEs with disparate temporal and spatial scales. Within this family of techniques, NewtonKrylov methods have been shown to be wi...