Results 1  10
of
16
The PVM Concurrent Computing System: Evolution, Experiences, and Trends
 PARALLEL COMPUTING
, 1994
"... The PVM system, a software framework for heterogeneous concurrent computing in networked environments, has evolved in the past several years into a viable technology for distributed and parallel processing in a variety of disciplines. PVM supports a straightforward but functionally complete messa ..."
Abstract

Cited by 129 (7 self)
 Add to MetaCart
The PVM system, a software framework for heterogeneous concurrent computing in networked environments, has evolved in the past several years into a viable technology for distributed and parallel processing in a variety of disciplines. PVM supports a straightforward but functionally complete message passing model, and is capable of harnessing the combined resources of typically heterogeneous networked computing platforms to deliver high levels of performance and functionality. In this paper, we describe the architecture of PVM system, and discuss its computing model, the programming interface it supports, auxiliary facilities for process groups and MPP support, and some of the internal implementation techniques employed. Performance issues, dealing primarily with communication overheads, are analyzed, and recent findings as well as experimental enhancements to are presented. In order to demonstrate the viability of PVM for large scale scientific supercomputing, the paper incl...
Convergence of Algorithms of Decomposition Type for the Eigenvalue Problem
 Linear Algebra Appl
, 1995
"... We develop the theory of convergence of a generic GR algorithm for the matrix eigenvalue problem that includes the QR, LR, SR, and other algorithms as special cases. Our formulation allows for shifts of origin and multiple GR steps. The convergence theory is based on the idea that the GR algorithm p ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
We develop the theory of convergence of a generic GR algorithm for the matrix eigenvalue problem that includes the QR, LR, SR, and other algorithms as special cases. Our formulation allows for shifts of origin and multiple GR steps. The convergence theory is based on the idea that the GR algorithm performs nested subspace iteration with a change of coordinate system at each step. Thus the convergence of the GR algorithm depends on the convergence of certain sequences of subspaces. It also depends on the quality of the coordinate transformation matrices, as measured by their condition numbers. We show that with a certain obvious shifting strategy the GR algorithm typically has a quadratic asymptotic convergence rate. For matrices possessing certain special types of structure, cubic convergence can be achieved. Key words. eigenvalue, QR algorithm, GR algorithm, subspace iteration, convergence AMS(MOS) subject classifications. 65F15, 15A18 Running head: Convergence of Eigenvalue Algori...
QRlike Algorithms for Eigenvalue Problems
 SIAM J. Matrix Anal. Appl
, 2000
"... . In the year 2000 the dominant method for solving matrix eigenvalue problems is still the QR algorithm. This paper discusses the family of GR algorithms, with emphasis on the QR algorithm. Included are historical remarks, an outline of what GR algorithms are and why they work, and descriptions ..."
Abstract

Cited by 26 (11 self)
 Add to MetaCart
. In the year 2000 the dominant method for solving matrix eigenvalue problems is still the QR algorithm. This paper discusses the family of GR algorithms, with emphasis on the QR algorithm. Included are historical remarks, an outline of what GR algorithms are and why they work, and descriptions of the latest, highly parallelizable, versions of the QR algorithm. Now that we know how to parallelize it, the QR algorithm seems likely to retain its dominance for many years to come. 1. Introduction Since the early 1960's the standard algorithms for calculating the eigenvalues and (optionally) eigenvectors of "small" matrices have been the QR algorithm [29] and its variants. This is still the case in the year 2000 and is likely to remain so for many years to come. For us a small matrix is one that can be stored in the conventional way in a computer's main memory and whose complete eigenstructure can be calculated in a matter of minutes without exploiting whatever sparsity the matrix m...
The Transmission of Shifts and Shift Blurring in the QR Algorithm
, 1992
"... The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform poorly when m is large. In this paper the mechanism by which the shifts are transmitted through the matrix in the course of a multishift QR iteration is identified. Numerical evidence showing that the mechanism works well when m is small and poorly when m is large is presented. When the mechanism works poorly, the convergence of the algorithm is degraded proportionately. 1. Introduction The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices [7], [9], [16]. It is therefore worrisome that attempts to parallelize the QR algorithm have been mostly unsatisfactory. (However, the work of Henry and van de Geijn [10], [11] is recent good news.) One atte...
Experimental implementation of dynamic access ordering
 In Proceedings of IEEE 27th Hawaii International Conference on Systems Sciences (HICSS27
, 1994
"... ..."
MORPH: A System Architecture for Robust High Performance Using Customization (An NSF 100 TeraOps Point Design Study)
, 1996
"... ..."
Hardware Support for Dynamic Access Ordering: Performance of Some Design Options
, 1993
"... Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge" scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spat ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge" scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components "on the other side of the cache"  they should not be treated as uniform accesstime RAM. This paper describes the use of hardwareassisted access ordering on a uniprocessor system. Our technique combines compiletime detection of memory access patterns with a memory subsystem that decouples the order of requests ...
An Analytic Model of SMC Performance
, 1993
"... Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processormemory performance gap. We are part of a team developing a combined hardware/software scheme for implementing ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processormemory performance gap. We are part of a team developing a combined hardware/software scheme for implementing access ordering dynamically at runtime. The hardware part of this solution is the Stream Memory Controller, or SMC. In order to validate the SMC concept, we have conducted numerous simulation experiments, the results of which are presented elsewhere. Here we develop an analytical model to bound SMC performance, and demonstrate that the simulation behavior of our dynamic accessordering heuristics approaches that bound. An Analytic Model of SMC Performance Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA 22903 mckee@cs.virginia.edu An Analytic Model of SMC Performance 1 An Analytic Model of SMC Performance 1. Introduction The growing disparity be...
Uniprocessor SMC Performance on Vectors with NonUnit Strides
, 1994
"... Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge " scientific problems. Access ordering is one technique that can help bridge the processor memory performance ga ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge " scientific problems. Access ordering is one technique that can help bridge the processor memory performance gap. Our solution combines compiletime detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. The hardware part of this solution is the Stream Memory Controller, or SMC. We have conducted numerous simulation experiments to evaluate uniprocessor SMC performance for unitstride vectors; the results of these are presented elsewhere. Here we examine uniprocessor SMC performance for nonunit stride vectors. We present simulation results and extend the analytic performance model proposed in an earlier...
Parallel Programming Systems for LAN Distributed Computing
 Computing, Proceedings of 14th IEEE International Conference on Distributed Computing Systems
, 1994
"... The goal of this paper 1 is to give some ideas about run time efficiency of distributed computing environments. Six tools were investigated: PVM, P4, ANSA, SR, Strand, and Linda. They have been chosen because they represent different approaches to distributed programming systems construction. The ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The goal of this paper 1 is to give some ideas about run time efficiency of distributed computing environments. Six tools were investigated: PVM, P4, ANSA, SR, Strand, and Linda. They have been chosen because they represent different approaches to distributed programming systems construction. The experimental results of communication tests and processor farm model efficiency have been presented and discussed. 1 Introduction The LAN distributed computing belongs to a very hot area of research. Results of many projects reached recently a mature state and a rich set of software systems and programming languages for distributed processing is available on the market and in the public domain sector. This paper is some kind of snapshot of a very rapidly changing domain. The goal of this paper is to give some ideas about run time efficiency of distributed computing environments. Focusing only on the efficiency is much too narrow for a complete evaluation. However, it is a useful factor th...