Results 1 - 10
of
12
The Data-Distribution-Independent Approach to Scalable Parallel Libraries
, 1995
"... this document in the required format ..."
A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies
, 1995
"... In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coh ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance...
Communication in GLOBE: An Object-Based Worldwide Operating System
- In Proc. Fifth International Workshop on Object Orientation in Operating Systems
, 1996
"... Current paradigms for interprocess communication are not sufficient to describe the exchange of information at an adequate level of abstraction. They are either too lowlevel, or their implementations cannot meet performance requirements. As an alternative, we propose distributed shared objects as a ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Current paradigms for interprocess communication are not sufficient to describe the exchange of information at an adequate level of abstraction. They are either too lowlevel, or their implementations cannot meet performance requirements. As an alternative, we propose distributed shared objects as a unifying concept. These objects offer user-defined operations on shared state, but allow for efficient implementations through replication and distribution of state. In contrast to other object-based models, these implementation aspects are completely hidden from applications. 1 Introduction In the 1960s and 1970s, the computing universe was dominated by mainframes and minicomputers that ran batch and timesharing operating systems. Typical examples of these systems were OS/360 and UNIX. These system were primarily concerned with the efficient and secure sharing of the resources of a single machine among many competing users. In the 1980s, personal computers became popular. These machines h...
Parallel Application Software on High Performance Computers - Parallel Diagonalisation Routines.
, 1996
"... In this report we list diagonalisation routines available for parallel computers. The methodology of each routine is outlined together with benchmark results on a typical matrix where available. Storage requirements and advantages and disadvantages of the method are also compared. The vast majority ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this report we list diagonalisation routines available for parallel computers. The methodology of each routine is outlined together with benchmark results on a typical matrix where available. Storage requirements and advantages and disadvantages of the method are also compared. The vast majority of these routines are available for real dense symmetric matrices only, although there is a known requirement for other data types -- such as Hermitian or structured sparse matrices. We will report on new codes as they become available. This report is available from http://www.dl.ac.uk/TCSC/HPCI/ c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in any communication about our tests or investigations. ii CONTENTS iii Contents 1 Summary 1 1.1 Test Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Recommendations : : : : : : : : : : :...
Distributed Shared Objects as a Communication Paradigm
- In Proc. of the Second Annual ASCI Conference
, 1996
"... . Current paradigms for interprocess communication are not sufficient to describe the exchange of information at an adequate level of abstraction. They are either too low-level, or their implementations cannot meet performance requirements. As an alternative, we propose distributed shared objects as ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
. Current paradigms for interprocess communication are not sufficient to describe the exchange of information at an adequate level of abstraction. They are either too low-level, or their implementations cannot meet performance requirements. As an alternative, we propose distributed shared objects as a unifying concept. These objects offer user-defined operations on shared state, but allow for efficient implementations through replication and distribution of state. In contrast to other object-based models, these implementation aspects are completely hidden from applications. 1 Introduction Communication can be viewed at different levels of abstraction. At a high level, it appears as an exchange of information between processes. These processes are either contained in a single parallel or distributed application, or may otherwise belong to different applications that need to communicate. At a low level, communication appears as the mere transfer of bits from one address space to another...
CRPC Research into Linear Algebra Software for High Performance Computers
, 1994
"... In this paper we look at a number of approaches being investigated in the Center for Research on Parallel Computation (CRPC) to develop linear algebra software for high-performance computers. These approaches are exemplified by the LAPACK, templates, and ARPACK projects. LAPACK is a software library ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In this paper we look at a number of approaches being investigated in the Center for Research on Parallel Computation (CRPC) to develop linear algebra software for high-performance computers. These approaches are exemplified by the LAPACK, templates, and ARPACK projects. LAPACK is a software library for performing dense and banded linear algebra computations, and was designed to run efficiently on high performance computers. We focus on the design of the distributed memory version of LAPACK, and on an object-oriented interface to LAPACK. The templates project aims at making the task of developing sparse linear algebra software simpler and easier. Reusable software templates are provided that the user can then customize to modify and optimize a particular algorithm, and hence build a more complex applications. ARPACK is a software package for solving large scale eigenvalue problems, and is based on an implicitly restarted variant of the Arnoldi scheme. The paper focuses on issues impact...
Design and Implementation of a Multi-purpose Cluster System Network Interface Unit
, 1999
"... Today, the interface between a high speed network and a high performance computation node is the least mature hardware technology in scalable general purpose cluster computing. Currently, the one-interface-fits-all philosophy prevails. This approach performs poorly in some cases because of the compl ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Today, the interface between a high speed network and a high performance computation node is the least mature hardware technology in scalable general purpose cluster computing. Currently, the one-interface-fits-all philosophy prevails. This approach performs poorly in some cases because of the complexity of modern memory hierarchy and the wide range of communication sizes and patterns. Today's message passing NIU's are also unable to utilize the best data transfer and coordination mechanisms due to poor integration into the computation node's memory hierarchy. These shortcomings unnecessarily constrain the performance of cluster systems. Our thesis is that a cluster system NIU should support multiple communication interfaces layered on a virtual message queue substrate in order to streamline data movement both within each node as well as between nodes. The NIU should be tightly integrated into the computation node's memory hierarchy via the cachecoherent snoopy system bus so as to gain...
Early Applications in the Message-Passing Interface (MPI)
- The International Journal of Supercomputer Applications
, 1994
"... We describe a number of early efforts to make use of the Message Passing Interface (MPI) standard in applications, based on an informal survey conducted in May-June, 1994. Rather than a definitive statement of all MPI development work, this paper addresses initial successes, progress, and impression ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe a number of early efforts to make use of the Message Passing Interface (MPI) standard in applications, based on an informal survey conducted in May-June, 1994. Rather than a definitive statement of all MPI development work, this paper addresses initial successes, progress, and impressions that application developers have with MPI, according to the responses received. We summarize the important aspects of each survey response, and draw conclusions about the spread of MPI into applications. An understanding of message-passing, and access to the MPI standard are prerequisites for appreciating this paper. Some background material is provided to ease this requirement. Skjellum, et al. Early MPI: : : 3 1 Introduction In this paper, we describe a number of early efforts to make use of the Message Passing Interface (MPI) standard in real applications (Forum 1994a; Forum 1994b). An informal survey of efforts is reported here, together with our commentary. We summarize the respon...
Architecture Independent Parallel Design Tool: A Refinement-Based Methodology for the Design and Development of Parallel Software; A Design Methodology for Data-Parallel Applications
, 1996
"... Data-parallelism is a relatively well-understood form of parallel computation, yet developing simple applications can involve substantial efforts to express the problem in low-level data-parallel notations. We describe a process of software development for data-parallel applications starting from hi ..."
Abstract
- Add to MetaCart
Data-parallelism is a relatively well-understood form of parallel computation, yet developing simple applications can involve substantial efforts to express the problem in low-level data-parallel notations. We describe a process of software development for data-parallel applications starting from high-level specifications, generating repeated refinements of designs to match different architectural models and performance constraints, supporting a development activity with cost-benefit analysis. Primary issues are algorithm choice, correctness and efficiency, followed by data decomposition, load balancing and message-passing coordination. Development of a data-parallel multitarget tracking application is used as a case study, showing the progression from high to low-level refinements. We conclude by describing tool support for the process.
Parallel Application Software on High Performance Computers II. Linear Algebra Routines.
, 1996
"... This is a draft report. An abstract and further information will be provided at a later date. Apologies for the long list of references. c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in a ..."
Abstract
- Add to MetaCart
This is a draft report. An abstract and further information will be provided at a later date. Apologies for the long list of references. c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in any communication about our tests or investigations. Chapter 1 LINEAR ALGEBRA 1.1 Linear Algebra and Mathematical Libraries R.J. Allan Many scientific and engineering applications perform numerically intensive calculations relying heavily on vector and/or matrix operations. As a result the Basic Linear Algebra Subprograms library (BLAS), the Engineering Scientific Subroutine Library (ESSL) and the FORTRAN-77 equivalent were benchmarked on a single node on the IBM SP2 as well as on the Cray T3D and Intel iPSC/860. One of the fundamental requirements in realising the potential of past and present generations of high performance computers is the availability of efficient implementations of...

