Results 11 - 20
of
83
N-map: A virtual processor discrete event simulation tool for performance predicition in capse
- In 28th Annual Hawaii International Conference on Systems Sciences
, 1995
"... The CAPSE (Computer Aided Parallel Software Engineering) environment aims to assist a perfor-mance oriented parallel program development approach by integrating tools for performance prediction in the design phase, analytical or simulation based perfor-mance analysis in the detailed specification an ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
The CAPSE (Computer Aided Parallel Software Engineering) environment aims to assist a perfor-mance oriented parallel program development approach by integrating tools for performance prediction in the design phase, analytical or simulation based perfor-mance analysis in the detailed specification and coding phase, and finally monitoring in the testing and cor-rection phase. In this work, the N-MAP tool as part of the CAPSE environment is presented. N-MAP covers the crucial aspect of performance prediction to support a perfor-mance oriented, incremental development process of parallel applications such that implementation design choices can be investigated far ahead of the full coding of the application. Methodologically, N-MAP in an automatic parse and translate step generates a simu-lation program from a skeletal SPMD program, with which the programmer expresses just the constituent and performance critical program parts, subject to an incremental refinement. The simulated execution of the SPMD skeleton supports a variety of performance studies. We demonstrate the use and performance of the N-MAP tool by developing a linear system solver for the CM-5. 1
Performance Instrumentation Techniques for Parallel Systems
- SPRINGER-VERLAG LECTURE NOTES IN COMPUTER SCIENCE
, 1993
"... Although the nascent state of parallel systems makes empirical performance measurement, analysis and tuning critical, rapid technological evolution, coupled with short product life cycles, has often made it difficult to isolate fundamental experimental principles from implementation artifacts. By ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Although the nascent state of parallel systems makes empirical performance measurement, analysis and tuning critical, rapid technological evolution, coupled with short product life cycles, has often made it difficult to isolate fundamental experimental principles from implementation artifacts. By definition, the apparatus for experimental performance analysis (i.e., instrumentation specification, data buffering, timestamp generation, and data extraction) is shaped by the intended experiment and the object of study. In some environments, certain experiments are not feasible. Balancing the volume of captured performance data against its accuracy and timeliness requires both appropriate tools and an understanding of instrumentation costs, implementation alternatives, and support infrastructure.
Algorithmic bombardment for the iterative solution of linear systems: a poly-iterative approach
- JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS
, 1996
"... Many algorithms employing short recurrences have been developed for iteratively solving linear systems. Yet when the matrix is nonsymmetric or indefinite, or both, it is difficult to predict which method will perform best, or indeed, converge at all. Attempts have been made to classify the matrix pr ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Many algorithms employing short recurrences have been developed for iteratively solving linear systems. Yet when the matrix is nonsymmetric or indefinite, or both, it is difficult to predict which method will perform best, or indeed, converge at all. Attempts have been made to classify the matrix properties for which a particular method will yield a satisfactory solution, but "luck" still plays large role. This report describes the implementation of a poly-iterative solver. Here we apply three algorithms simultaneously to the system, in the hope that at least one will converge to the solution. While this approach has merit in a sequential computing environment, it is even more valuable in a parallel environment. By combining global communications, the cost of three methods can be reduced to that of a single method.
Problem Solving Environments For Partial Differential Equation Based Applications
, 1994
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xvi 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Modeling with Partial Differential Equations : : : : : : : : : : : : : : 1 1.2 Evolution of PDE Solving Software : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xvi 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Modeling with Partial Differential Equations : : : : : : : : : : : : : : 1 1.2 Evolution of PDE Solving Software : : : : : : : : : : : : : : : : : : : 3 1.3 Problem Solving Environments : : : : : : : : : : : : : : : : : : : : : 8 1.3.1 Properties of PSEs : : : : : : : : : : : : : : : : : : : : : : : : 8 1.3.2 PSEs vs. PSE Frameworks : : : : : : : : : : : : : : : : : : : : 9 1.4 PDE Based Applications and Application PSEs : : : : : : : : : : : : 10 1.4.1 The PDELab Prototype : : : : : : : : : : : : : : : : : : : : : 11 1.5 Overview of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2. THE ARCHITECTURE OF A SOFTWARE FRAMEWORK FOR BUILDING PROBLEM SOLVING ENVIRONMENTS FOR PDE BASED APPLICATIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.1 Introduction : : : : : : : : : : : : : : ...
Computing Global Combine Operations in the Multi-Port Postal Model
, 1996
"... Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequent ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multi-port postal model for message-passing systems. This model is characterized by three parameters: n --- the number of processors, k --- the number of ports per processor, and --- the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent out from k other processors \Gamma 1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of comm...
Early Experiences And Performance Of The Intel Paragon
, 1994
"... Experiences and performance figures are reported from early tests of the 512-node Intel Paragon XPS35 at Oak Ridge National Laboratory. Computation performance of the 50 MHz i860XP processor as well as communication performance of the 200 megabyte/second mesh are reported and compared with other mul ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Experiences and performance figures are reported from early tests of the 512-node Intel Paragon XPS35 at Oak Ridge National Laboratory. Computation performance of the 50 MHz i860XP processor as well as communication performance of the 200 megabyte/second mesh are reported and compared with other multiprocessors. Single and multiple hop communication bandwidths and latencies are measured. Concurrent communication speeds and speed under network load are also measured. File I/O performance of the mesh-attached Parallel File System is measured. Early experiences with OSF/Mach and SUNMOS operating systems are reported, as well results from porting various distributed-memory applications. This report also summarizes the second phase of a Cooperative Research and Development Agreement between Oak Ridge National Laboratory and Intel in evaluating a 66-node Intel Paragon XPS5. - v - 1. Introduction The Department of Energy selected Oak Ridge National Laboratory (ORNL) as one of its high perf...
A User's Guide to the BLACS v1.1
, 1997
"... The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time req ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines. This report describes the library which has arisen from this project. This work was supported in part by DARPA and ARO under contract number DAAL03-91-C-0047, and in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615...
Hypercube Clock Synchronization
, 1991
"... Algorithms for synchronizing the times and frequencies of the clocks of Intel and Ncube hypercube multiprocessors are presented. Bounds for the error in estimating clock offsets and frequencies are formulated in terms of the clock read error and message transmission time. Clock and communication per ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Algorithms for synchronizing the times and frequencies of the clocks of Intel and Ncube hypercube multiprocessors are presented. Bounds for the error in estimating clock offsets and frequencies are formulated in terms of the clock read error and message transmission time. Clock and communication performance of the Ncube and Intel hypercubes are analyzed, and performance of the synchronization algorithms is presented. Keywords: clock synchronization, hypercube communication. - v - 1. Introduction In distributed computing, there is a need for accurate clocks that give the same time on every computing element. A distributed computation may be implemented over a local area network of computing elements or on a parallel processor. Distributed computing systems that depend on such a common time include transaction-processing systems, real-time systems, and simulation systems. A common time is also needed for event traces and synchronization such as timeouts or checkpoints. This report add...
The IBM External User Interface for Scalable Parallel Systems
- Parallel Computing
, 1994
"... The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and dev ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and development of the EUI. 1 Introduction The IBM External User Interface (EUI) for scalable parallel systems is an application programming interface that was designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM Scalable POWERparallel System 9076 SP1, was announced in February 1993. The design of the EUI is aimed at providing a scalable and efficient parallel programming environment over a wide range of parallel products from IBM. The EUI is a library of coordination and communication routines that can be invoked from within FORTRAN or C application programs. Over the past several years, a large number of programming environments and communication l...
Maritxu: Generic Visualisation of Highly Parallel Processing
"... This paper presents Maritxu, a visualisation system developed to aid in the understanding and optimisation of highly parallel computer systems. Maritxu implements a new visualisation paradigm, combining the use of colour, algorithm animation and visual overload. Maritxu adopts an integral approach t ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents Maritxu, a visualisation system developed to aid in the understanding and optimisation of highly parallel computer systems. Maritxu implements a new visualisation paradigm, combining the use of colour, algorithm animation and visual overload. Maritxu adopts an integral approach to Parallel Computing by emphasising the role of the processor (rather than process) in performance optimisation. There are no constraints to the visualisation: network size, topology, subnetworks, icon shapes, data mapping, statistics, etc. can all be interactively defined by the user. An example of performance enhancement in a practical transputer-based EMC application is included. 1 Introduction In the last few years parallel processing has become a reality affordable to many companies and institutions. Parallel processing promises enormous power at a cost - understanding. Parallel computations involve highly complex and little understood behaviour. This lack of understanding prevents eff...

