Results 11 - 20
of
203
Newton-Krylov-Schwarz: An Implicit Solver for CFD
"... Newton-Krylov methods and Krylov-Schwarz (domain decomposition) methods have begun to become established in computational fluid dynamics (CFD) over the past decade. The former employ a Krylov method inside of Newton's method in a Jacobian-free manner, through directional di erencing. The latter empl ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
Newton-Krylov methods and Krylov-Schwarz (domain decomposition) methods have begun to become established in computational fluid dynamics (CFD) over the past decade. The former employ a Krylov method inside of Newton's method in a Jacobian-free manner, through directional di erencing. The latter employ an overlapping Schwarz domain decomposition to derive a preconditioner for the Krylov accelerator that relies primarily on local information, for data-parallel concurrency. They may be composed as Newton-Krylov-Schwarz (NKS) methods, which seem particularly well suited for solving nonlinear elliptic systems in high-latency, distributed-memory environments. Wegive a brief description of this family of algorithms, with an emphasis on domain decomposition iterative aspects.We then describe numerical simulations with Newton-Krylov-Schwarz methods on aerodynamics applications emphasizing comparisons with a standard defect-correction approach, subdomain preconditioner consistency, subdomain preconditioner quality, and the effect of a coarse grid.
D-Stampede: Distributed Programming System for Ubiquitous Computing
- In Proceedings of the 22nd International Conference on Distributed Computing Systems(ICDCS
, 2002
"... We focus on an important problem in the space of ubiquitous computing, namely, programming support for the distributed heterogeneous computing elements that make up this environment. We address the interactive, dynamic, and stream-oriented nature of this application class and develop appropriate com ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
We focus on an important problem in the space of ubiquitous computing, namely, programming support for the distributed heterogeneous computing elements that make up this environment. We address the interactive, dynamic, and stream-oriented nature of this application class and develop appropriate computational abstractions in the D-Stampede distributed programming system. The key features of DStampede include indexing data streams temporally, correlating different data streams temporally, performing automatic distributed garbage collection of unnecessary stream data, supporting high performance by exploiting hardware parallelism where available, supporting platform and language heterogeneity, and dealing with application level dynamism. We discuss the features of D-Stampede, the programming ease it affords, and its performance.
Modeling the Communication Performance of the IBM SP2
- In Proc. of 10 th IEEE Int. Parallel Processing Symp
, 1996
"... The objective of this paper is to develop models that characterize the communication performance of a messagepassing multicomputer by taking the IBM SP2 as a case study. The paper evaluates and models the three aspects of the communication performance: scheduling overhead, message-passing time, and ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
The objective of this paper is to develop models that characterize the communication performance of a messagepassing multicomputer by taking the IBM SP2 as a case study. The paper evaluates and models the three aspects of the communication performance: scheduling overhead, message-passing time, and synchronization overhead. Performance models are developed for the basic communication patterns, enabling the estimation of the communication times of a message-passing application. Such estimates facilitate activities such as application tuning, selection of the best available implementation technique, and performance comparisons among different multicomputers. 1. Introduction A distributed-memory multicomputer consists of multiple processor nodes interconnected by a message-passing network. Each processor node is an autonomous computer consistingof a central processing unit (CPU), memory, communication adapter, and---for at least some nodes---mass storage and I/O devices. Figure 1 shows a...
Data movement and control substrate for parallel scientific computing
- of Lecture Notes in Computer Science
, 1997
"... In this paper, we describe the design and implementation of a datamovement and control substrate (DMCS) for network-based, homogeneous communication within a single multiprocessor. DMCS is an implementation of an API for communication and computation that has been proposed by the PORTS consortium. O ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
In this paper, we describe the design and implementation of a datamovement and control substrate (DMCS) for network-based, homogeneous communication within a single multiprocessor. DMCS is an implementation of an API for communication and computation that has been proposed by the PORTS consortium. One of the goals of this consortium is to de ne an API that can support heterogeneous computing without undue performance penalties for homogeneous computing. Preliminary results in our implementation suggest that this is quite feasible. The DMCS implementation seeks to minimize the assumptions made about the homogeneous nature of its target architecture. Finally, we present some extensions to the API for PORTS that will improve the performance of sparse, adaptive and irregular type of numeric computations.
Cloud Computing and Grid Computing 360-Degree Compared
"... Cloud Computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for Cloud Computing and there seems to be no consensus on what a Cloud is. On the other hand, Cloud Computing is not a completely new concept; it has intricate connection to the relatively n ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Cloud Computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for Cloud Computing and there seems to be no consensus on what a Cloud is. On the other hand, Cloud Computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established Grid Computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast Cloud Computing with Grid Computing from various angles and give insights into the essential characteristics of both.
Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters
- Parallel Computing
, 2003
"... This article describes the context, design, and recent development of the LAPACK for Clusters (LFC) project. It has been developed in the framework of Self-Adapting Numerical Software (SANS) since we believe such an approach can deliver the con- venience and ease of use of existing sequential enviro ..."
Abstract
-
Cited by 20 (15 self)
- Add to MetaCart
This article describes the context, design, and recent development of the LAPACK for Clusters (LFC) project. It has been developed in the framework of Self-Adapting Numerical Software (SANS) since we believe such an approach can deliver the con- venience and ease of use of existing sequential environments bundled with the power and versatility of highly-tuned parallel codes that execute on clusters. Accomplishing this task is far from trivial as we argue in the paper by presenting pertinent case studies and possible usage scenarios.
A Simple MPI Process Swapping Architecture for Iterative
- Applications, The International Journal of High Performance Computing Applications
, 2004
"... Parallel computing is now popular and mainstream, but performance and ease-of-use remain elusive to many endusers. There exists a need for performance improvements that can be easily retrofitted to existing parallel applications. In this paper we present MPI process swapping, a simple performance en ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Parallel computing is now popular and mainstream, but performance and ease-of-use remain elusive to many endusers. There exists a need for performance improvements that can be easily retrofitted to existing parallel applications. In this paper we present MPI process swapping, a simple performance enhancing add-on to the MPI programming paradigm. MPI process swapping improves performance by dynamically choosing the best available resources throughout application execution, using MPI process over-allocation and real-time performance measurement. Swapping provides fully automated performance monitoring and process management, and a rich set of primitives to control execution behavior manually or through an external tool. Swapping, as defined in this implementation, can be added to iterative MPI applications and requires as few as three lines of source code change. We verify our design for a particle dynamics application on desktop resources within a production commercial environment. 1.
Titanium performance and potential: an NPB experimental study
- In proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2005
"... Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the langu ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the language features and demonstrate their use in the context of the NAS Parallel Benchmarks, a standard benchmark suite of kernels that are common across many scientific applications. We argue that parallel languages like Titanium provide greater expressive power than conventional approaches, enabling much more concise and expressive code and minimizing time to solution without sacrificing parallel performance. Empirical results demonstrate our Titanium implementations of three of the NAS Parallel Benchmarks can match or even exceed the performance of the standard MPI/Fortran implementations at realistic problem sizes and processor scales, while still using far cleaner, shorter and more maintainable code. 1
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
- in 2nd Workshop on Hardware/Software Support for High Performance Scientific and Engineering Computing (SHPSEC-03
, 2003
"... MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures. Unfortun ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures. Unfortunately, neither the traditional MPI 1.1 API, nor the newer MPI 2.0 extensions for one-sided communication provide an adequate compilation target for global address space languages, and this is likely to be the case for many other parallel languages as well. Simulating one-sided communication under the MPI 1.1 API is too expensive, while the MPI 2.0 one-sided API imposes a number of restrictions that would need to be incorporated at the language level, as is it unlikely that a compiler could effectively hide them.
Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization
- Domain decomposition methods 10. The 10th int. conf., Boulder, volume 218 of Contemp. Math
, 1998
"... this article remain attractive even for such a code. ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
this article remain attractive even for such a code.

