Results 1 - 10
of
17
Cranium: An Interface for Message Passing on Adaptive Packet Routing Networks
- Proceedings of Parallel Computer Routing and Communication Workshop
, 1994
"... . Cranium is a processor-network interface for an interconnection network based on adaptive packet routing. Adaptive networks relax the restriction that packet order is preserved; packets may be delivered to their destinations in an arbitrary sequence. Cranium uses two mechanisms: an automatic-recei ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
. Cranium is a processor-network interface for an interconnection network based on adaptive packet routing. Adaptive networks relax the restriction that packet order is preserved; packets may be delivered to their destinations in an arbitrary sequence. Cranium uses two mechanisms: an automatic-receive interface for packet serialization and high performance, and a processor-initiated interface for flexibility. To minimize software overhead, Cranium is directly accessible by user-level programs. Protection for user-level message passing is implemented by mapping user-level handles into physical node identifiers and buffer addresses. 1 Introduction Scalable multicomputer architectures have been converging on a standard organization with four elements: a workstation microprocessor, main memory based on dynamic RAM, a point-to-point interconnection network and a processornetwork interface. Both the microprocessors and DRAM chips have become inexpensive and widely available. Multicomputer ...
High Performance Fortran: history, overview and current developments
- 1.4 TMC-261, Thinking Machines Corporation
, 1996
"... Processors as rectilinear mesh Physical Processors ALIGN REALIGN DISTRIBUTE REDISTRIBUTE Figure 1: HPF data mapping model The essential idea is that the relative alignment between two data objects may be defined --- typically by relations involving array elements. A group of aligned objects (arrays) ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Processors as rectilinear mesh Physical Processors ALIGN REALIGN DISTRIBUTE REDISTRIBUTE Figure 1: HPF data mapping model The essential idea is that the relative alignment between two data objects may be defined --- typically by relations involving array elements. A group of aligned objects (arrays) are then distributed onto an abstract rectilinear grid of processors. The mapping of abstract to physical processors is implementation dependent. In figure 2 the arrays A and B are aligned with respect to each other so that A(I+1,J+1) and B(I,J) are aligned. The ALIGN directive is used to specify the alignment between objects. In figure 3 we show some potential distributions of the arrays A and B onto a set of 4 (abstract) processors. We are displaying axis 1 vertically and axis 2 horizontally. For the distribution (BLOCK,*) axis 1 REAL A(16,16),B(14,14) A B !HPF$ ALIGN B(I,J) WITH A(I+1,J+1) Figure 2: Array Alignment (BLOCK,*) (*,BLOCK) (BLOCK,BLOCK) P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 ...
User Transparency: A Fully Sequential Programming Model for Efficient Data Parallel Image Processing
- Science, University of Amsterdam, The Netherlands
, 2002
"... Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing rese ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e., sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's e#ectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching, (2) multi-baseline stereo vision, and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas.
P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems
- IEEE Transactions on Parallel and Distributed Systems
, 2002
"... One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing pr ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often signi cantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper we introduce a new point-to-point communication model (called P-3PC, or the 'Parameterized model based on the Three Paths of Communication') that is speci cally designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing de nitions as well.
Software and hardware requirements for some applications of parallel computing to industrial problems
, 1995
"... We discuss the hardware and software requirements that appear relevant for a set of industrial applications of parallel computing. these are divided into 33 separate categories, and come from a recent survey of industry in New York State. The software discussions includes data parallel languages, me ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
We discuss the hardware and software requirements that appear relevant for a set of industrial applications of parallel computing. these are divided into 33 separate categories, and come from a recent survey of industry in New York State. The software discussions includes data parallel languages, message passing, databases, and high-level integration systems. The analysis is based on a general classification of problem architectures originally developed for academic applications of parallel computing. Suitable hardware architectures are suggested for each application. The general discussion is crystalized with three case studies: computational chemistry, computational fluid dynamics, including manufacturing, and Monte Carlo Methods.
An Application Perspective on High-Performance Computing and Communications
, 1996
"... We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computat ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computational fluid dynamics; command and control; or crisis management; and multimedia services to client computers and settop boxes. The hardware varies from tightly-coupled parallel supercomputers to heterogeneous distributed systems. The software models span HPF and data parallelism, to distributed information systems and object/data ow parallelism on the Web. We find that in each case, it is reasonably clear that "HPCC works in principle," and postulate that this knowledge can be used in a new generation of software infrastructure based on the WebWindows approach, and discussed in an accompanying paper.
Specification Composition for the Verification of Message Passing Program Composition
, 1997
"... We present a specification composition technique which supports the message passing composition of applications by the Ensemble methodology. In Ensemble applications are built by composing reusable executable program components designed with scalable communication interfaces. We define reusable spec ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We present a specification composition technique which supports the message passing composition of applications by the Ensemble methodology. In Ensemble applications are built by composing reusable executable program components designed with scalable communication interfaces. We define reusable specifications of program components, using coloured Petri nets, which are then composed to obtain the specification of the application. The composition is controlled by the same script that is used to compose the application. 1
High Performance Distributed Computing
- Syracuse University
, 1995
"... High Performance Distributed Computing (HPDC) is driven by the rapid advance of two related technologies -- those underlying computing and communications, respectively. These technology pushes are linked to application pulls, which vary from the use of a cluster of some 20 workstations simulating fl ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
High Performance Distributed Computing (HPDC) is driven by the rapid advance of two related technologies -- those underlying computing and communications, respectively. These technology pushes are linked to application pulls, which vary from the use of a cluster of some 20 workstations simulating fluid flow around an aircraft, to the complex linkage of several hundred million advanced PCs around the globe to deliver and receive multimedia information. The review of base technologies and exemplar applications is followed by a brief discussion of software models for HPDC, which are illustrated by two extremes -- PVM and the conjectured future World Wide Web based WebWork concept. The narrative is supplemented by a glossary describing the diverse concepts used in HPDC.
COSY - An Operating System for Highly Parallel Computers
, 1996
"... This paper is dedicated to Prof. Horst Wettstein on the occasion of the 25th anniversary of his appointment. 1 Motivation ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper is dedicated to Prof. Horst Wettstein on the occasion of the 25th anniversary of his appointment. 1 Motivation
Scalable Caching Techniques for a Weakly Coherent Memory
, 1995
"... Machines Workshop'96 Abstract There is a growing acceptance that general purpose parallel computers need to be based on a scalable shared memory computational model, with the ability to exploit data locality for good performance. Today, this is commonly achieved by mapping the model onto a distrib ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Machines Workshop'96 Abstract There is a growing acceptance that general purpose parallel computers need to be based on a scalable shared memory computational model, with the ability to exploit data locality for good performance. Today, this is commonly achieved by mapping the model onto a distributed memory computer with a scalable interconnect (supporting linear increases in bisection bandwidth). Example machines are the Cray T3D, IBM SP2 and Intel Paragon, which can scale in performance to 100's or 1000's of processors. This results in a two-level memory hierarchy, in which data is either local or shared across the machine. The next few years will see a trend in the move towards cache coherent multiprocessors, using the techniques employed by machines such as the KSR (cache-only memory) and the DASH (distributed directories). An example is the forthcoming Silicon Graphics cache coherent multiprocessor. This will simplify the programming model by presenting a single level memory h...

