Results 1  10
of
15
Parallel Volume Rendering and Data Coherence
, 1993
"... The two key issues in implementing a parallel raycasting volume renderer are the work distribution and the data distribution. We have implemented such a renderer on the Fujitsu AP1000 using an adaptive imagespace subdivision algorithm based on the workerfarm paradigm for the work distribution, an ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
The two key issues in implementing a parallel raycasting volume renderer are the work distribution and the data distribution. We have implemented such a renderer on the Fujitsu AP1000 using an adaptive imagespace subdivision algorithm based on the workerfarm paradigm for the work distribution, and a distributed virtual memory, implemented in software, to provide the data distribution. Measurements show that this scheme works efficiently and effectively utilizes the data coherence that is inherent in volume data. Categories and Subject Descriptors: C.1.2 [Proces sor Architectures]: Multiple Data Stream Architectures  multipleinstructionstream, multipledatastream (MIMD); I.3.1 [Computer Graphics]: Hardware Architecture  parallel processing; I.3.7 [Computer Graphics]: ThreeDimensional Graphics and Realism  ray tracing Key Words: Visualization, volume rendering, worker farm, image space, distributed virtual memory. 1 Introduction Volume rendering using raycasting is a...
Factorization of the tenth and eleventh Fermat numbers
, 1996
"... . We describe the complete factorization of the tenth and eleventh Fermat numbers. The tenth Fermat number is a product of four prime factors with 8, 10, 40 and 252 decimal digits. The eleventh Fermat number is a product of five prime factors with 6, 6, 21, 22 and 564 decimal digits. We also note a ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
. We describe the complete factorization of the tenth and eleventh Fermat numbers. The tenth Fermat number is a product of four prime factors with 8, 10, 40 and 252 decimal digits. The eleventh Fermat number is a product of five prime factors with 6, 6, 21, 22 and 564 decimal digits. We also note a new 27decimal digit factor of the thirteenth Fermat number. This number has four known prime factors and a 2391decimal digit composite factor. All the new factors reported here were found by the elliptic curve method (ECM). The 40digit factor of the tenth Fermat number was found after about 140 Mflopyears of computation. We discuss aspects of the practical implementation of ECM, including the use of specialpurpose hardware, and note several other large factors found recently by ECM. 1. Introduction For a nonnegative integer n, the nth Fermat number is F n = 2 2 n + 1. It is known that F n is prime for 0 n 4, and composite for 5 n 23. Also, for n 2, the factors of F n are of th...
Reliable Hardware Barrier Synchronization Schemes
 In Proceedings of the 11th IEEE International Parallel Processing Symposium
, 1997
"... Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or a combination of these mechanisms. However, none of these schemes emphasize faulttolerant barrier operations. In this paper, ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or a combination of these mechanisms. However, none of these schemes emphasize faulttolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardwarebased barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated faulttolerant messagepassing protocols are presented. The protocols are optimized for the nofault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme is evaluated with and without specialized support at the network interface and compared with similar approaches using softwarebased schemes. It promises significant potential to be applied to switchbased parallel systems, e...
Sparse Householder QR Factorization on a Mesh
, 1996
"... In this document we are going to analyze the parallelization of QR factorization by means of Householder transformations. This parallelization will be carried out on a machine with a mesh topology (a 2D torus to be more precise). We use a cyclic distribution of the elements of the sparse matrix M w ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this document we are going to analyze the parallelization of QR factorization by means of Householder transformations. This parallelization will be carried out on a machine with a mesh topology (a 2D torus to be more precise). We use a cyclic distribution of the elements of the sparse matrix M we want to decompose over the processors. Each processor represents the nonzero elements of its part of the matrix by a onedimensional doubly linked list data structure. Then, we describe the different procedures that constitute the parallel algorithm. As an application of QR factorization, we concentrate on the least squares problem and finally we present a evaluation of the efficiency of this algorithm for a set of test matrices from the HarwellBoeing sparse matrix collection.
A GeneralPurpose Parallel Sorting Algorithm
, 1995
"... A parallel sorting algorithm is presented for general purpose internal sorting on MIMD machines. The algorithm initially sorts the elements within each node using a serial sorting algorithm, then proceeds with a twophase parallel merge. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A parallel sorting algorithm is presented for general purpose internal sorting on MIMD machines. The algorithm initially sorts the elements within each node using a serial sorting algorithm, then proceeds with a twophase parallel merge.
Integrating High Performance Computing and Virtual Environments
 Proceedings of the Seventh Parallel Computing Workshop
, 1997
"... High performance computing has become accepted as a tool that can be used to solve many large scale computational problems. Because of the complexity of the problems associated with high performance computing, visualization of the output of high performance computing applications has always been an ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
High performance computing has become accepted as a tool that can be used to solve many large scale computational problems. Because of the complexity of the problems associated with high performance computing, visualization of the output of high performance computing applications has always been an important factor in providing a complete problem solving environment for the high performance computing user. As visualization technology advances, it is important to consider what impacts those advances will have on the integration of high performance computing and visualization. Virtual environments are the most recent, and arguably the most powerful, visualization environments in use today. In this paper we analyze the current state of the research of integrating visualization, and in particular virtual environments, with high performance computing. We also present a framework for implementing such an environment and report on the status of its implementation at the Australian National Un...
An Approach to Decrease Fillin in Sparse Orthogonalizations on a MIMD Computer
 In Sixth Parallel Computing Workshop, PCW'96
, 1996
"... Modified GramSchmidt, Householder transformations and Givens plane rotations are popular methods in linear algebra based on orthogonalization. The major advantage of the orthogonal factorization is its stability. We present a general heuristic strategy of fillin control in the rankrevealing spars ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Modified GramSchmidt, Householder transformations and Givens plane rotations are popular methods in linear algebra based on orthogonalization. The major advantage of the orthogonal factorization is its stability. We present a general heuristic strategy of fillin control in the rankrevealing sparse QR decomposition, which is not very costly for the execution times, using these three algorithms. This strategy is based on column pivoting and it maintains accuracy in the results, as we show experimentally on the AP1000 for a set of sparse matrices from the HarwellBoeing collection. 1 Introduction QR orthogonalization appears in several applications of linear algebra: linear systems of equations, least squares problems [11], eigenvalue calculation [6, 8], which are necessary to solve in many areas, such as fluid dynamics, circuit simulation, structural analysis \Delta \Delta \Delta Therefore, it is necessary highquality software for those scientific numeric computations; for instance,...
Characterization of MessagePassing Overhead on the AP3000 Multicomputer
 in Proceedings of 30 th International Conference on Parallel Processing, ICPP’01, 2001
"... The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this work, we ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this work, we evaluate, model and compare the performance of the messagepassing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.
A Parallel Implementation of an Out of Core Dense Matrix Solver using HiDIOS
"... Many engineering and physical problems produce fully populated matrices. These can be solved in core and in parallel using various algorithms. However, even a small 4000 by 4000 matrix can occupy 256Mb if complex double precision numbers are used. If larger problems are to be solved then the interna ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Many engineering and physical problems produce fully populated matrices. These can be solved in core and in parallel using various algorithms. However, even a small 4000 by 4000 matrix can occupy 256Mb if complex double precision numbers are used. If larger problems are to be solved then the internal memory requirements need to be reduced. Many parallel computers now contain disk systems accessible from each individual processor. A large matrix can be split into smaller matrices which are then stored externally to reduce the internal memory requirements. Each smaller matrix is updated and factorised to solve the larger matrix. Results are presented using this approach and the HiDIOS disk system on the AP1000 at Imperial College. 1 Introduction One of the primary reasons behind the porting of many engineering applications to parallel computing platforms, is the requirement to solve larger problems in less time. Once the application is stable and running effectively in parallel, the pro...