Parallel Volume Rendering and Data Coherence
, 1993
"... The two key issues in implementing a parallel raycasting volume renderer are the work distribution and the data distribution. We have implemented such a renderer on the Fujitsu AP1000 using an adaptive imagespace subdivision algorithm based on the workerfarm paradigm for the work distribution, an ..."
The two key issues in implementing a parallel raycasting volume renderer are the work distribution and the data distribution. We have implemented such a renderer on the Fujitsu AP1000 using an adaptive imagespace subdivision algorithm based on the workerfarm paradigm for the work distribution, and a distributed virtual memory, implemented in software, to provide the data distribution. Measurements show that this scheme works efficiently and effectively utilizes the data coherence that is inherent in volume data. Categories and Subject Descriptors: C.1.2 [Proces sor Architectures]: Multiple Data Stream Architectures  multipleinstructionstream, multipledatastream (MIMD); I.3.1 [Computer Graphics]: Hardware Architecture  parallel processing; I.3.7 [Computer Graphics]: ThreeDimensional Graphics and Realism  ray tracing Key Words: Visualization, volume rendering, worker farm, image space, distributed virtual memory. 1 Introduction Volume rendering using raycasting is a...
Factorization of the tenth and eleventh Fermat numbers
, 1996
"... . We describe the complete factorization of the tenth and eleventh Fermat numbers. The tenth Fermat number is a product of four prime factors with 8, 10, 40 and 252 decimal digits. The eleventh Fermat number is a product of five prime factors with 6, 6, 21, 22 and 564 decimal digits. We also note a ..."
. We describe the complete factorization of the tenth and eleventh Fermat numbers. The tenth Fermat number is a product of four prime factors with 8, 10, 40 and 252 decimal digits. The eleventh Fermat number is a product of five prime factors with 6, 6, 21, 22 and 564 decimal digits. We also note a new 27decimal digit factor of the thirteenth Fermat number. This number has four known prime factors and a 2391decimal digit composite factor. All the new factors reported here were found by the elliptic curve method (ECM). The 40digit factor of the tenth Fermat number was found after about 140 Mflopyears of computation. We discuss aspects of the practical implementation of ECM, including the use of specialpurpose hardware, and note several other large factors found recently by ECM. 1. Introduction For a nonnegative integer n, the nth Fermat number is F n = 2 2 n + 1. It is known that F n is prime for 0 n 4, and composite for 5 n 23. Also, for n 2, the factors of F n are of th...
Reliable Hardware Barrier Synchronization Schemes
 In Proceedings of the 11th IEEE International Parallel Processing Symposium
, 1997
"... Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or a combination of these mechanisms. However, none of these schemes emphasize faulttolerant barrier operations. In this paper, ..."
Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or a combination of these mechanisms. However, none of these schemes emphasize faulttolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardwarebased barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated faulttolerant messagepassing protocols are presented. The protocols are optimized for the nofault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme is evaluated with and without specialized support at the network interface and compared with similar approaches using softwarebased schemes. It promises significant potential to be applied to switchbased parallel systems, e...
Sparse Householder QR Factorization on a Mesh
 In Fourth Euromicro Workshop on Parallel and Distributed Processing
, 1996
A GeneralPurpose Parallel Sorting Algorithm
, 1995
"... A parallel sorting algorithm is presented for general purpose internal sorting on MIMD machines. The algorithm initially sorts the elements within each node using a serial sorting algorithm, then proceeds with a twophase parallel merge. ..."
A parallel sorting algorithm is presented for general purpose internal sorting on MIMD machines. The algorithm initially sorts the elements within each node using a serial sorting algorithm, then proceeds with a twophase parallel merge.
Integrating High Performance Computing and Virtual Environments
 Proceedings of the Seventh Parallel Computing Workshop
, 1997
"... High performance computing has become accepted as a tool that can be used to solve many large scale computational problems. Because of the complexity of the problems associated with high performance computing, visualization of the output of high performance computing applications has always been an ..."
High performance computing has become accepted as a tool that can be used to solve many large scale computational problems. Because of the complexity of the problems associated with high performance computing, visualization of the output of high performance computing applications has always been an important factor in providing a complete problem solving environment for the high performance computing user. As visualization technology advances, it is important to consider what impacts those advances will have on the integration of high performance computing and visualization. Virtual environments are the most recent, and arguably the most powerful, visualization environments in use today. In this paper we analyze the current state of the research of integrating visualization, and in particular virtual environments, with high performance computing. We also present a framework for implementing such an environment and report on the status of its implementation at the Australian National Un...
An Approach to Decrease Fillin in Sparse Orthogonalizations on a MIMD Computer
 In Sixth Parallel Computing Workshop, PCW'96
, 1996
"... Modified GramSchmidt, Householder transformations and Givens plane rotations are popular methods in linear algebra based on orthogonalization. The major advantage of the orthogonal factorization is its stability. We present a general heuristic strategy of fillin control in the rankrevealing spars ..."
Modified GramSchmidt, Householder transformations and Givens plane rotations are popular methods in linear algebra based on orthogonalization. The major advantage of the orthogonal factorization is its stability. We present a general heuristic strategy of fillin control in the rankrevealing sparse QR decomposition, which is not very costly for the execution times, using these three algorithms. This strategy is based on column pivoting and it maintains accuracy in the results, as we show experimentally on the AP1000 for a set of sparse matrices from the HarwellBoeing collection. 1 Introduction QR orthogonalization appears in several applications of linear algebra: linear systems of equations, least squares problems [11], eigenvalue calculation [6, 8], which are necessary to solve in many areas, such as fluid dynamics, circuit simulation, structural analysis \Delta \Delta \Delta Therefore, it is necessary highquality software for those scientific numeric computations; for instance,...
Dynamic Load Distribution on Meshes with Broadcasting
 Internat. J. High Speed Comput
, 1997
"... The mesh is a popular multicomputer topology due to its simplicity and need for few connections, regardless of the size of the system. However, onetoall (broadcast ing) or pointtopoint communication between two nodes far away result in a long delay. In this paper, we propose a mesh with a glo ..."
The mesh is a popular multicomputer topology due to its simplicity and need for few connections, regardless of the size of the system. However, onetoall (broadcast ing) or pointtopoint communication between two nodes far away result in a long delay. In this paper, we propose a mesh with a global bus as a multicomputer topology. This structure enhances the communication capability of the mesh and shows that the mesh with a global bus has more salient properties than the mesh, the hypercube, or other variants. These properties includes a small diameter, a relatively small degree, small average distance, suitability for broadcasting, small initial data distribution time, etc. We propose a dynamic load distribution algorithm to utilize the enhanced communication capability of the mesh with a global bus. Also, asynchronous bus control and arbitration logics are designed to support the proposed algorithm efficiently. It has been shown through simulation that the proposed dynamic load...
Index Bit Permutations for Automatic Data Redistribution
 Proceedings of the Visualization '97 Parallel Rendering Symposium
, 1997
