Results 1 
6 of
6
Efficient, Flexible, and Typed Group Communications in Java
, 2002
"... Group communication is a crucial feature for highperformance and Grid computing. While previous works and libraries proposed such a characteristic (e.g. MPI, or objectoriented frameworks), the use of groups imposed specific constraints on programmers  for instance the use of dedicated interfaces ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Group communication is a crucial feature for highperformance and Grid computing. While previous works and libraries proposed such a characteristic (e.g. MPI, or objectoriented frameworks), the use of groups imposed specific constraints on programmers  for instance the use of dedicated interfaces to trigger group communications. We aim at a more flexible mechanism...
The Optimal Effectiveness Metric for Parallel Application Analysis
 In Special Issue on Parallel Models
, 1998
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction The measurement of parallel applications is of significant interest to the evaluation and categorization of various parallel algorithms. This paper argues that a useful metric for parallel algorithm analysis should be consistent, quantitative, predictive, and relevant. A metric is consistent if independent researchers analyzing the same algorithm on the same architecture will arrive at similar conclusions. A metric is quantitative if it can be used to quantify the benefit of disparate algo...
A Metric for Parallel PolyAlgorithm Design
, 1997
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction Consider the development of an algorithm that multiplies matrices. Of the many algorithms that might be employed, two of the most popular methods are the naive algorithm and the Strassen algorithm. Asymptotically, the naive algorithm is O(n 3 ) while the Strassen algorithm is O(n 2:81 ). Although the Strassen algorithm is asymptotically better than the naive algorithm, the setup cost of the Strassen algorithm makes it inefficient for small matrices. An optimal algorithm might employ bo...
The Parallel Mathematical Libraries Project (PMLP): Overview, Design Innovations, and Preliminary Results
 In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... In this paper, we present a new, parallel, mathematical library suite for sparse matrices. The Parallel Mathematical Libraries Project (PMLP), a joint effort of Intel, Lawrence Livermore National Laboratory, the Russian Federal Nuclear Laboratory (VNIIEF), and Mississippi State University (MSU), ..."
Abstract
 Add to MetaCart
In this paper, we present a new, parallel, mathematical library suite for sparse matrices. The Parallel Mathematical Libraries Project (PMLP), a joint effort of Intel, Lawrence Livermore National Laboratory, the Russian Federal Nuclear Laboratory (VNIIEF), and Mississippi State University (MSU), constitutes a concerted effort to create a supportable, comprehensive "Sparse Objectoriented Mathematical Library Suite." With overall design and software validation work at MSU, most software development and testing at VNIIEF, and logistics and other miscellaneous support provided by LLNL and Intel, this international collaboration brings objectoriented programming techniques and C++ to the task of providing linear and nonlinear algebraicoriented algorithms for scientists and engineers. Language bindings for C, Fortran77, and C++ are provided.
ComponentBased Derivation of a Parallel Stiff ODE Solver Implemented in a Cluster of Computers
, 2000
"... A componentbased methodological approach to derive distributed implementations of parallel ODE solvers is proposed. The proposal is based on the incorporation of explicit constructs for performance polymorphism into a methodology to derive group parallel programs of numerical methods from SPMD modu ..."
Abstract
 Add to MetaCart
A componentbased methodological approach to derive distributed implementations of parallel ODE solvers is proposed. The proposal is based on the incorporation of explicit constructs for performance polymorphism into a methodology to derive group parallel programs of numerical methods from SPMD modules. These constructs enable the structuring of the derivation process into clearly defined steps, each one associated with a different type of optimization. The approach makes possible to obtain a flexible tuning of a parallel ODE solver for several execution contexts and applications. Following this methodological approach, a relevant parallel numerical scheme for solving stiff ODES has been optimized and implemented on a PC cluster. This numerical scheme is obtained from a Radau IIA Implicit Runge–Kutta method and exhibits a high degree of potential parallelism. Several numerical experiments have been performed by using several test problems with different structural characteristics. These experiments show satisfactory speedup results. KEY WORDS: Componentbased software development; numerical algorithms with multilevel parallelism; parallel linear algebra libraries; stiff ordinary differential equations; distributed memory machines. 1
On the Efficiency of Register File versus Broadcast Interconnect for Collective Communications in DataParallel Hardware Accelerators
"... Abstract—Reducing power consumption and increasing efficiency is a key concern for many applications. How to design highly efficient computing elements while maintaining enough flexibility within a domain of applications is a fundamental question. In this paper, we present how broadcast buses can el ..."
Abstract
 Add to MetaCart
Abstract—Reducing power consumption and increasing efficiency is a key concern for many applications. How to design highly efficient computing elements while maintaining enough flexibility within a domain of applications is a fundamental question. In this paper, we present how broadcast buses can eliminate the use of power hungry multiported register files in the context of dataparallel hardware accelerators for linear algebra operations. We demonstrate an algorithm/architecture codesign for the mapping of different collective communication operations, which are crucial for achieving performance and efficiency in most linear algebra routines, such as GEMM, SYRK and matrix transposition. We compare a broadcast bus based architecture with conventional SIMD, 2DSIMD and flat register file for these operations in terms of area and energy efficiency. Results show that fast broadcast data movement abilities in a prototypical linear algebra core can achieve up to 75x better power and up to 10x better area efficiency compared to traditional SIMD architectures. I.