Results 1  10
of
13
The PVM Concurrent Computing System: Evolution, Experiences, and Trends
 PARALLEL COMPUTING
, 1994
"... The PVM system, a software framework for heterogeneous concurrent computing in networked environments, has evolved in the past several years into a viable technology for distributed and parallel processing in a variety of disciplines. PVM supports a straightforward but functionally complete messa ..."
Abstract

Cited by 129 (7 self)
 Add to MetaCart
The PVM system, a software framework for heterogeneous concurrent computing in networked environments, has evolved in the past several years into a viable technology for distributed and parallel processing in a variety of disciplines. PVM supports a straightforward but functionally complete message passing model, and is capable of harnessing the combined resources of typically heterogeneous networked computing platforms to deliver high levels of performance and functionality. In this paper, we describe the architecture of PVM system, and discuss its computing model, the programming interface it supports, auxiliary facilities for process groups and MPP support, and some of the internal implementation techniques employed. Performance issues, dealing primarily with communication overheads, are analyzed, and recent findings as well as experimental enhancements to are presented. In order to demonstrate the viability of PVM for large scale scientific supercomputing, the paper incl...
Convergence of Algorithms of Decomposition Type for the Eigenvalue Problem
 Linear Algebra Appl
, 1995
"... We develop the theory of convergence of a generic GR algorithm for the matrix eigenvalue problem that includes the QR, LR, SR, and other algorithms as special cases. Our formulation allows for shifts of origin and multiple GR steps. The convergence theory is based on the idea that the GR algorithm p ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
We develop the theory of convergence of a generic GR algorithm for the matrix eigenvalue problem that includes the QR, LR, SR, and other algorithms as special cases. Our formulation allows for shifts of origin and multiple GR steps. The convergence theory is based on the idea that the GR algorithm performs nested subspace iteration with a change of coordinate system at each step. Thus the convergence of the GR algorithm depends on the convergence of certain sequences of subspaces. It also depends on the quality of the coordinate transformation matrices, as measured by their condition numbers. We show that with a certain obvious shifting strategy the GR algorithm typically has a quadratic asymptotic convergence rate. For matrices possessing certain special types of structure, cubic convergence can be achieved. Key words. eigenvalue, QR algorithm, GR algorithm, subspace iteration, convergence AMS(MOS) subject classifications. 65F15, 15A18 Running head: Convergence of Eigenvalue Algori...
QRlike Algorithms for Eigenvalue Problems
 SIAM J. Matrix Anal. Appl
, 2000
"... . In the year 2000 the dominant method for solving matrix eigenvalue problems is still the QR algorithm. This paper discusses the family of GR algorithms, with emphasis on the QR algorithm. Included are historical remarks, an outline of what GR algorithms are and why they work, and descriptions ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
. In the year 2000 the dominant method for solving matrix eigenvalue problems is still the QR algorithm. This paper discusses the family of GR algorithms, with emphasis on the QR algorithm. Included are historical remarks, an outline of what GR algorithms are and why they work, and descriptions of the latest, highly parallelizable, versions of the QR algorithm. Now that we know how to parallelize it, the QR algorithm seems likely to retain its dominance for many years to come. 1. Introduction Since the early 1960's the standard algorithms for calculating the eigenvalues and (optionally) eigenvectors of "small" matrices have been the QR algorithm [29] and its variants. This is still the case in the year 2000 and is likely to remain so for many years to come. For us a small matrix is one that can be stored in the conventional way in a computer's main memory and whose complete eigenstructure can be calculated in a matter of minutes without exploiting whatever sparsity the matrix m...
The Transmission of Shifts and Shift Blurring in the QR Algorithm
, 1992
"... The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform poorly when m is large. In this paper the mechanism by which the shifts are transmitted through the matrix in the course of a multishift QR iteration is identified. Numerical evidence showing that the mechanism works well when m is small and poorly when m is large is presented. When the mechanism works poorly, the convergence of the algorithm is degraded proportionately. 1. Introduction The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices [7], [9], [16]. It is therefore worrisome that attempts to parallelize the QR algorithm have been mostly unsatisfactory. (However, the work of Henry and van de Geijn [10], [11] is recent good news.) One atte...
Experimental Implementation of Dynamic Access Ordering
 University of Virginia, TR
, 1993
"... As microprocessor speeds increase, memory bandwidth is rapidly becoming the performance bottleneck in the execution of vectorlike algorithms. Although caching provides adequate performance for many problems, caching alone is an insufficient solution for vector applications with poor temporal and sp ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
As microprocessor speeds increase, memory bandwidth is rapidly becoming the performance bottleneck in the execution of vectorlike algorithms. Although caching provides adequate performance for many problems, caching alone is an insufficient solution for vector applications with poor temporal and spatial locality. Moreover, the nature of memories themselves has changed. Current DRAM components should not be treated as uniform accesstime RAM: achieving greater bandwidth requires exploiting the characteristics of components at every level of the memory hierarchy. This paper describes hardwareassisted access ordering and our hardware development effort to build a Stream Memory Controller (SMC) that implements the technique for a commercially available highperformance microprocessor, the Intel i860. Our strategy augments caching by combining compiletime detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that ...
MORPH: A System Architecture for Robust High Performance Using Customization (An NSF 100 TeraOps Point Design Study)
, 1996
"... ..."
An Analytic Model of SMC Performance
, 1993
"... Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processormemory performance gap. We are part of a team developing a combined hardware/software scheme for implementing ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processormemory performance gap. We are part of a team developing a combined hardware/software scheme for implementing access ordering dynamically at runtime. The hardware part of this solution is the Stream Memory Controller, or SMC. In order to validate the SMC concept, we have conducted numerous simulation experiments, the results of which are presented elsewhere. Here we develop an analytical model to bound SMC performance, and demonstrate that the simulation behavior of our dynamic accessordering heuristics approaches that bound. An Analytic Model of SMC Performance Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA 22903 mckee@cs.virginia.edu An Analytic Model of SMC Performance 1 An Analytic Model of SMC Performance 1. Introduction The growing disparity be...
Uniprocessor SMC Performance on Vectors with NonUnit Strides
, 1994
"... Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge " scientific problems. Access ordering is one technique that can help bridge the processor memory performance gap. Our sol ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vectorlike algorithms, including the "grand challenge " scientific problems. Access ordering is one technique that can help bridge the processor memory performance gap. Our solution combines compiletime detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. The hardware part of this solution is the Stream Memory Controller, or SMC. We have conducted numerous simulation experiments to evaluate uniprocessor SMC performance for unitstride vectors; the results of these are presented elsewhere. Here we examine uniprocessor SMC performance for nonunit stride vectors. We present simulation results and extend the analytic performance model proposed in an earlier...
Parallel Programming Systems for LAN Distributed Computing
 Computing, Proceedings of 14th IEEE International Conference on Distributed Computing Systems
, 1994
"... The goal of this paper 1 is to give some ideas about run time efficiency of distributed computing environments. Six tools were investigated: PVM, P4, ANSA, SR, Strand, and Linda. They have been chosen because they represent different approaches to distributed programming systems construction. The ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The goal of this paper 1 is to give some ideas about run time efficiency of distributed computing environments. Six tools were investigated: PVM, P4, ANSA, SR, Strand, and Linda. They have been chosen because they represent different approaches to distributed programming systems construction. The experimental results of communication tests and processor farm model efficiency have been presented and discussed. 1 Introduction The LAN distributed computing belongs to a very hot area of research. Results of many projects reached recently a mature state and a rich set of software systems and programming languages for distributed processing is available on the market and in the public domain sector. This paper is some kind of snapshot of a very rapidly changing domain. The goal of this paper is to give some ideas about run time efficiency of distributed computing environments. Focusing only on the efficiency is much too narrow for a complete evaluation. However, it is a useful factor th...
Search for Anomalous
"... We have searched for anomalous Z ! flflfl events with the L3 detector at LEP. No significant deviations from the expected QED e + e \Gamma ! flflfl events are observed. The branching ratio upper limit for a composite Z decaying directly into three photons is found to be 1.0 \Theta10 \Gamma5 at ..."
Abstract
 Add to MetaCart
We have searched for anomalous Z ! flflfl events with the L3 detector at LEP. No significant deviations from the expected QED e + e \Gamma ! flflfl events are observed. The branching ratio upper limit for a composite Z decaying directly into three photons is found to be 1.0 \Theta10 \Gamma5 at 95% C.L. The branching ratio upper limits for the process Z! flX; X ! flfl are in the range of 0.4 to 1.3 \Theta10 \Gamma5 , depending on the mass and width of the scalar particle X. In the context of a model with magnetic monopoles coupling to the Z, we find BR(Z! flflfl) ! 0:8 \Theta 10 \Gamma5 at 95% C.L; this results in a lower mass limit of 510 GeV for a magnetic monopole. (Submitted to Phys. Lett. B) Introduction In the Standard Model the decay Z ! flflfl proceeds via fermion and Wloops and is strongly suppressed; the branching ratio is expected to be about 5.4\Theta10 \Gamma10 [1]. An enhanced branching ratio would be a clear indication of new physics. Such enhancements ar...