• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Execution Time of Symmetric Eigensolvers (1997)

by K Stanley
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

LOOKAHEAD AND ALGORITHMIC BLOCKING TECHNIQUES COMPARED FOR PARALLEL MATRIX FACTORIZATION

by Peter E. Strazdins
"... ..."
Abstract - Cited by 21 (6 self) - Add to MetaCart
Abstract not found

Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations

by Mark Francis Adams , 1999
"... The finite element method has grown, in the past 40 years, to be a popular method for the simulation of physical systems in science and engineering. The finite element method is used in a wide array of industries. In fact just about any enterprise that makes a physical product can, and probably do ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
The finite element method has grown, in the past 40 years, to be a popular method for the simulation of physical systems in science and engineering. The finite element method is used in a wide array of industries. In fact just about any enterprise that makes a physical product can, and probably does, use finite element technology. The success of the finite element method is due in large part to its ability to allow the use of accurate formulation of partial differential equations (PDEs), on arbitrarily general physical domains with complex boundary conditions. Additionally, the rapid growth in the computational power available in todays computers - for an ever more affordable price - has made finite element technology...

OPTIMAL LOAD BALANCING TECHNIQUES FOR BLOCK-CYCLIC DECOMPOSITIONS FOR MATRIX FACTORIZATION

by Peter Strazdins
"... In this paper, we present a new load balancing technique, called panel scattering, which is generally applicable for parallel block-partitioned dense linear algebra algorithms, such as matrix factorization. Here, the panels formed in such computation are divided across their length, and evenly (re-) ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
In this paper, we present a new load balancing technique, called panel scattering, which is generally applicable for parallel block-partitioned dense linear algebra algorithms, such as matrix factorization. Here, the panels formed in such computation are divided across their length, and evenly (re-)distributed among all processors. It is shown how this technique can be efficiently implemented for the general block-cyclic matrix distribution, requiring only the collective communication primitives that required for block-cyclic parallel BLAS. In most situations, panel scattering yields optimal load balance and cell computation speed across all stages of the computation. It has also advantages in naturally yielding good memory access patterns. Compared with traditional methods which minimize communication costs at the expense of load balance, it has a small (in some situations negative) increase in communication volume costs. It however incurs extra communication startup costs, but only by a factor not exceeding 2. To maximize load balance, storage block sizes should be kept small; furthermore, in many situations of interest, there will be a small or even negative communication penalty for doing so. Results will be given on the Fujitsu AP+ parallel computer, which will compare the performance of panel scattering with previously established methods, for LU, LLT and QR factorization. These are consistent with a detailed performance model for LU factorization.

Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations

by Mark P. Sears, Ken Stanley, U. C. Berkeley, Greg Henry
"... In this paper we report the development of a very high performance parallel eigensolver based on the portable ScaLAPACK library, and its application to electronic structure calculations in the MP-Quest code. This work was done on ASCI-Red, a supercomputer based on over 4600 dual-processor Pentiu ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper we report the development of a very high performance parallel eigensolver based on the portable ScaLAPACK library, and its application to electronic structure calculations in the MP-Quest code. This work was done on ASCI-Red, a supercomputer based on over 4600 dual-processor Pentium Pro 1 nodes at Sandia National Laboratories. We report sustained performance in the code of 605GFlops and peak performance in the eigensolver of 684GFlops. This is comparable to performance obtained from MP-Linpack on a similar sized problem. For a smaller problem we have sustained performance of 420GFlops in the application and peak performance in the eigensolver of 563GFlops. Impact of this work on the specific application is important, but the development of significant improvements to a portable eigensolver and other libraries will also benefit a number of applications. 1 Introduction MP-Quest is a parallel electronic structure program which is used extensively in productio...

A Dense Complex Symmetric Indefinite Solver for the Fujitsu AP3000

by Peter E. Strazdins , 1999
"... This paper describes the design, implementation and performance of a parallel direct dense symmetric-indefinite solver routine. Such a solver is required for the large complex systems arising from electro-magnetic field analysis, such as are generated from the AccuField application. The primary targ ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper describes the design, implementation and performance of a parallel direct dense symmetric-indefinite solver routine. Such a solver is required for the large complex systems arising from electro-magnetic field analysis, such as are generated from the AccuField application. The primary target architecture for the solver is the Fujitsu AP3000, a distributed memory machine based on the UltraSPARC processor. The routine is written entirely in terms of the DBLAS Distributed Library, recently extended for complex precision. It uses the BunchKaufman diagonal pivoting method and is based on the LAPACK algorithm, with several modi cations required for efficient parallel implementation and one modification to reduce the amount of symmetric pivoting. Currently the routine uses a standard BLAS computational interface and can use either the MPI, BLACS or VPPLib communication interfaces (the latter is only available under the APruntime V2.0 system for the AP3000). The routine out-performs its e...

An efficient implementation of parallel eigenvalue computation for massively parallel processing

by Takahiro Katagiri, Yasumasa Kanada - PARALLEL COMPUTING , 2001
"... This article describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem siz ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This article describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g. the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2--5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University