Results 1  10
of
10
Solving Sparse Triangular Systems on Distributed Memory Multicomputers
 Euromicro PDP'98, IEEE Computer Society
, 1997
"... In this paper we describe and compare two different methods for solving sparse triangular systems in distributed memory multiprocessor architectures. The two methods involve some preprocessing overheads so they are primarily of interest in solving many systems with the same coefficient matrix. Both ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we describe and compare two different methods for solving sparse triangular systems in distributed memory multiprocessor architectures. The two methods involve some preprocessing overheads so they are primarily of interest in solving many systems with the same coefficient matrix. Both algorithms start off from the idea of the classical substitution method. The first algorithm we present introduces a concept of data driven flow and makes use of nonblocking communications in order to dynamically extract the inherent parallelism of sparse systems [4]. The second algorithm uses a reordering technique for the unknowns, so the final system can be grouped in variable blocksizes where the rows are independent and can be solved in parallel. This latter technique is called level scheduling because of the way it is represented in the adjacency graph. Although each method may be applied to any type of triangular system, our interest is centred on the triangular systems that arise f...
Engineering a Parallel Compiler for Standard ML
 In Proceedings of the 10th International Workshop on Implementations of Functional Language
, 1998
"... . We present the design and partial implementation of an automated parallelising compiler for Standard ML using algorithmic skeletons. Source programs are parsed and elaborated using the ML Kit compiler and a small set of higher order functions are automatically detected and replaced with parallel e ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. We present the design and partial implementation of an automated parallelising compiler for Standard ML using algorithmic skeletons. Source programs are parsed and elaborated using the ML Kit compiler and a small set of higher order functions are automatically detected and replaced with parallel equivalents. Without the presence of performance predictions, the compiler simply instantiates all instances of the known HOFs with parallel equivalents. The resulting SML program is then output as Objective Caml and compiled with the parallel skeleton harnesses giving an executable which runs on networks of workstations. The parallel harnesses are implemented in C with MPI providing the communications subsystem. A substantial parallel exemplar is presented, an implementation of the Canny edge tracking algorithm from computer vision which is parallelised by implementing the map skeleton over a set of subimages. 1 Introduction We are investigating the development of a fully automatic parallel...
Numerical Analysis Of Abrupt Heterojunction Bipolar Transistors.
, 1998
"... This paper presents a physicalmathematical model for abrupt heterojunction transistors and its solution using numerical methods with application to InP/InGaAs HBTs. The physical model is based on the combination of the driftdiffusion transport model in the bulk with thermionic emission and tunne ..."
Abstract
 Add to MetaCart
This paper presents a physicalmathematical model for abrupt heterojunction transistors and its solution using numerical methods with application to InP/InGaAs HBTs. The physical model is based on the combination of the driftdiffusion transport model in the bulk with thermionic emission and tunnelling transmission through the emitterbase interface. FermiDirac statistics and bandgap narrowing distribution between the valence and conduction bands are considered in the model. A compact formulation is used that makes it easy to take into account other effects such as the nonparabolic nature of the bands or the presence of various subbands in the conduction process. The simulator has been implemented for distributed memory multicomputers, making use of the MPI message passing standard library. In order to accelerate the solution process of the linear system, iterative methods with parallel incomplete factorizationbased preconditioners have been used. 1. INTRODUCTION
Parallel Simulator for Heterojunction Bipolar Transistors
, 1998
"... This paper presents a physicalmathematical model for abrupt heterojunction transistors and its solution using numerical methods in parallel architectures, with application to InP/InGaAs HBTs. The physical model is based on the combination of the driftdiffusion transport model in the bulk with th ..."
Abstract
 Add to MetaCart
This paper presents a physicalmathematical model for abrupt heterojunction transistors and its solution using numerical methods in parallel architectures, with application to InP/InGaAs HBTs. The physical model is based on the combination of the driftdiffusion transport model in the bulk with thermionic emission and tunnelling transmission through the emitterbase interface. FermiDirac statistics and bandgap narrowing distribution between the valence and conduction bands are considered in the model. A compact formulation is used that makes it easy to take into account other effects such as the nonparabolic nature of the bands or the presence of various subbands in the conduction process. The simulator has been implemented for distributed memory multicomputers making use of the MPI message passing standard library and the native communications library of the AP1000 computer ( called CELLOS). In order to accelerate the solution process of the linear system, incomplete factor...
Parallel Domain Decomposition Applied to 3D Poisson Equation for Gradual HBT
"... This paper presents the implementation of a parallel solver for the 3D Poisson equation applied to gradual HBT simulation in a memory distributed multiprocessor. The Poisson equation was discretized using a finite element method (FEM) on an unstructured tetrahedral mesh. Domain decomposition methods ..."
Abstract
 Add to MetaCart
This paper presents the implementation of a parallel solver for the 3D Poisson equation applied to gradual HBT simulation in a memory distributed multiprocessor. The Poisson equation was discretized using a finite element method (FEM) on an unstructured tetrahedral mesh. Domain decomposition methods were used to solve the linear systems. Wehavesimulated a gradual HBT, and we present electrical results and some measures of the efficiency of the parallel execution for several solvers. This code was implemented using a message passing standard library MPI and was tested on a CRAY T3E. Keywords: 3D Poisson equation, Gradual HBT, Domain Decomposition, Multiprocessors. INTRODUCTION Heterojunction bipolar transistors are nowadays an activeareaofresearchduetointerest in their highspeed electronic circuit applications. For example InP/InGaAs HBT's have attained frequencies of over 200GHz [1]. Developmentofsimulators for HBT's is essential in order to better understand their physical beh...
Numerical Analysis Of Continuity Equations For 3d Bjt Simulation
, 1999
"... This paper presents a numerical analysis for the 3D continuity equations applied to BJT simulation in a memory distributed multiprocessor. The continuity equations were discretized using a finite element method (FEM) on an unstructured tetrahedral mesh. Domain decomposition methods were tested to so ..."
Abstract
 Add to MetaCart
This paper presents a numerical analysis for the 3D continuity equations applied to BJT simulation in a memory distributed multiprocessor. The continuity equations were discretized using a finite element method (FEM) on an unstructured tetrahedral mesh. Domain decomposition methods were tested to solve the linear systems. We have applied this formulation to a 3D bipolar junction transistor (BJT), and we present some measures of the parallel execution time for several solvers and some electrical results. This code was implemented using the messagepassing standard library MPI and was tested on a CRAY T3E. 1 Introduction The development of semiconductor device simulators is currently an important research area. The first programs enabled onedimensional simulations to be carried out. Nevertheless, with the reduction of the physical dimensions of the devices to be simulated, the need for carrying out 3D simulations in order to be able to study the diverse factors that affect the device...
Solving Sparse Triangular Systems on Distributed Memory Multicomputers
 Euromicro PDP'98, IEEE Computer Society
, 1998
"... In this paper we describe and compare two different methods for solving sparse triangular systems in distributed memory multiprocessor architectures. The two methods involve some preprocessing overheads so they are primarily of interest in solving many systems with the same coefficient matrix. Both ..."
Abstract
 Add to MetaCart
In this paper we describe and compare two different methods for solving sparse triangular systems in distributed memory multiprocessor architectures. The two methods involve some preprocessing overheads so they are primarily of interest in solving many systems with the same coefficient matrix. Both algorithms start off from the idea of the classical substitution method. The first algorithm we present introduces a concept of data driven flow and makes use of nonblocking communications in order to dynamically extract the inherent parallelism of sparse systems. The second algorithm uses a reordering technique for the unknowns, so the final system can be grouped in variable blocksizes where the rows are independent and can be solved in parallel. This latter technique is called level scheduling because of the way it is represented in the adjacency graph. 1 Introduction The solution of triangular systems is an important part of the solution of sparse linear systems, either using direct m...
Mapping MPI to Machine: Implementing the MPI Standard
"... Device Interface (ADI), which is implemented as compiler macros. The ADI provides four main functions: Sending and receiving, data transfer, queuing, and devicedependent operations. The job of the implementor is thus to tailor the lower, devicedependent layer to the target machine; the upper, devi ..."
Abstract
 Add to MetaCart
Device Interface (ADI), which is implemented as compiler macros. The ADI provides four main functions: Sending and receiving, data transfer, queuing, and devicedependent operations. The job of the implementor is thus to tailor the lower, devicedependent layer to the target machine; the upper, deviceindependent layer remains virtually unchanged. Since one of the goals of MPICH is to demonstrate the efficiency of MPI, several optimizations are included. One of them is optimization by message length. Four send protocols are supported. The short send protocol piggybacks the message inside of the message envelope. The eager send protocol delivers the message data without waiting for the sender to request it, on the assumption that the probability is high that the receiver will accept the message. The rendezvous protocol doesn't deliver the data until the receiver explicitly requests it, thus allowing the setup time necessary to send large messages with high bandwidth. And the get protoco...
Parallel Domain Decompositionapplied To 3d Simulation Of Gradual Hbts
, 1999
"... This paper presents the implementation of a parallel solver for the Poisson, hole and electron continuity equations applied to a threedimensional simulation of gradual HBT's in a memory distributed multiprocessor. These equations were discretised using a finite element method (FEM) on an unstructur ..."
Abstract
 Add to MetaCart
This paper presents the implementation of a parallel solver for the Poisson, hole and electron continuity equations applied to a threedimensional simulation of gradual HBT's in a memory distributed multiprocessor. These equations were discretised using a finite element method (FEM) on an unstructured tetrahedral mesh. Domain decomposition methods were used to solve the linear systems. We have simulated a gradual HBT, and we present electrical results and some measures of the efficiency of the parallel execution for several solvers. This code was implemented using a messagepassing standard library MPI and was tested on a CRAY T3E.