Results 1 - 10
of
10
The design and implementation of the parallel out-of-core scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS-97-247
, 1997
"... This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full mat ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer submatrice panels into memory. The ‘left-looking ’ column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as incore computational kernels. We present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf linux cluster.
Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization
, 1999
"... In this paper we describe two efficient parallel out-of-core implementations of the Cholesky factorization. We use the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our out-of-core algorithms. The first algorith ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper we describe two efficient parallel out-of-core implementations of the Cholesky factorization. We use the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our out-of-core algorithms. The first algorithm uses in-core kernels with additional code to manage the I/O. This is the classical approach to out-of-core implementations of the Cholesky factorization. Our second algorithm adds an out-of-core implementation of the triangular solve with multiple right hand sides, which doesn't simply bring code in-core and run the in-core algorithm. This algorithm has the added benefit of requiring fewer copies of the matrix to be in-core at one time, thus allowing more of the matrix to be in-core at one time. Despite the extreme simplicity of POOCLAPACK and our out-of-core algorithm, the out-of-core Cholesky factorization implementation is shown to achieve in excess of 80% of peak performance on a 64 node configuration of the Cray T3E-600.
Fast multipole method for the biharmonic equation in three dimensions
- J. Comput. Phys
, 2006
"... The evaluation of sums (matrix-vector products) of the solutions of the three-dimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown tha ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The evaluation of sums (matrix-vector products) of the solutions of the three-dimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown that translations of elementary solutions of the biharmonic equation can be achieved by considering the translation of a pair of elementary solutions of the Laplace equations. The extension of the theory to the case of polyharmonic equations in R 3 is also discussed. An efficient way of performing the FMM for biharmonic equations using the solution of a complex valued FMM for the Laplace equation is presented. Compared to previous methods presented for the biharmonic equation our method appears more efficient. The theory is implemented and numerical tests presented that demonstrate the performance of the method for varying problem sizes and accuracy requirements. In our implementation, the FMM for the biharmonic equation is faster than direct matrix vector product for a matrix size of 550 for a relative L2 accuracy 2 =10 −4, and N = 3550 for 2 =10 −12. 1
Model-based Control of Adaptive Applications: An Overview
- in Proceedings of the Workshop on Next Generation Systems, International Parallel and Distributed Processing Symposium (IPDPS’02
, 2002
"... Model-based control utilizes performance models of applications to choose performant system configurations for execution of applications. The performance models used in this research separate specification of the software system from specification of the execution environment so that model-based con ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Model-based control utilizes performance models of applications to choose performant system configurations for execution of applications. The performance models used in this research separate specification of the software system from specification of the execution environment so that model-based control can select software configurations for a given execution environment or, conversely, select execution environments for a given software configuration. Some representations, methods, and tools that enable model-based control are briefly described. Application of some of the methods and tools to a stochastic optimization code is briefly sketched. The problems of model-based control of two other applications are defined and described. The relationship between model-based control and other methods for adaptive control is briefly discussed. 1. Model-Based Control Current procedures for selection of system configurations for execution of complex, parallel programs on complex, distributed execution environments are mostly ad hoc. Anecdotal evidence suggests that system configurations obtained from current ad hoc procedures are often sub-optimal. System configuration management is rendered even more difficult when the application is adaptive or when the execution environment changes. Model-based control is a methodology for generating effective system configurations that are based upon predictions obtained from performance models of the application and of its execution environment. Model-based control begins with the well-established but infrequently applied practice of using a performance model of the application and the initial execution environment to determine an effective initial system configuration. Performance models can be assembled manually and the analysis leading to effective configurations can incorporate human intelligence.
Continuum models of carbon nanotube-based composites by the BEM, Electronic Journal of Boundary Elements
- J. Bound. Elements
, 2003
"... This paper presents some recent advances in the boundary element method (BEM) for the analysis of carbon nanotube (CNT)-based composites. Carbon nanotubes, formed conceptually by rolling thin graphite sheets, have been found to be extremely stiff, strong and resilient, and therefore may be ideal for ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents some recent advances in the boundary element method (BEM) for the analysis of carbon nanotube (CNT)-based composites. Carbon nanotubes, formed conceptually by rolling thin graphite sheets, have been found to be extremely stiff, strong and resilient, and therefore may be ideal for reinforcing composite materials. However, the thin cylindrical shape of the CNTs presents great challenges to any computational method when these thin shell-like CNTs are embedded in a matrix material. The BEM, based on exactly the same boundary integral equation (BIE) formulation developed by Rizzo some forty years ago, turns out to be an ideal numerical tool for such simulations using continuum mechanics. Modeling issues regarding model selections, representative volume elements, interface conditions and others, will be discussed in this paper. Methods for dealing with nearly-singular integrals which arise in the BEM analysis of thin or layered materials and are crucial for the accuracy of such analyses will be reviewed. Numerical examples using the BEM and compared with the finite element method (FEM) will be presented to demonstrate the efficiency and accuracy of the BEM in analyzing the CNT-reinforced composites. 1.
Adaptive Modeling of Composite Structures: Modeling Error Estimation
- Texas Institute for Computational and Applied Mathematics
, 1999
"... The accurate simulation of the behavior of composite materials depends on many factors such as the character, size, topology, and mechanical properties of the microstructure, the denition of the domain of interest and the loads as well as the specication of accuracy desired and the goal of the simul ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The accurate simulation of the behavior of composite materials depends on many factors such as the character, size, topology, and mechanical properties of the microstructure, the denition of the domain of interest and the loads as well as the specication of accuracy desired and the goal of the simulation. There is, therefore, a need to develop a systematic technique to adaptively select the most appropriate scale that governs specic features of the response that are of interest. The concept of hierarchical modeling provides such a framework. In this approach, the accuracy of a given mathematical model, compared to a model of ner scale, is evaluated with the use of a posteriori estimates of the \modeling error" and these form the basis of an adaptive procedure. This investigation focuses on the analysis of the equilibrium of linearly elastic heterogeneous bodies characterized by highly oscillatory elastic coecients. The control of modeling error in such systems was studied in earlie...
Thermal Stress Analysis of Multi-Layer Thin Films and Coatings By an Advanced Boundary Element Method
, 2000
"... An advanced boundary element method (BEM) is developed in this paper for analyzing thin layered structures, such as thin films and coatings, under the thermal loading. The boundary integral equation (BIE) formulation for steady-state thermoelasticity is reviewed and a special case, that is, the BIE ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
An advanced boundary element method (BEM) is developed in this paper for analyzing thin layered structures, such as thin films and coatings, under the thermal loading. The boundary integral equation (BIE) formulation for steady-state thermoelasticity is reviewed and a special case, that is, the BIE for a uniform distribution of the temperature change, is presented. The new nearly-singular integrals arising from the applications of the BIE/BEM to thin layered structures under thermal loading are treated in the same way as developed earlier for thin structures under the mechanical loading. Three 2-D test problems involving layered thin films and coatings on an elastic body are studied using the developed thermal BEM and a commercial FEM software. Numerical results for displacements and interfacial stresses demonstrate that the developed BIE/BEM remains to be very accurate, efficient in modeling, and surprisingly stable, for thin elastic materials with the thickness-to-length ratios down to 10 -9 (the nano-scale). This thermal BEM capability can be employed to investigate other more important and realistic thin film and coating problems, such as residual stresses, interfacial crack initiation and propagation (peelingoff) , in electronic packaging or other engineering applications. Correspondence to: Yijun Liu (E-mail: yijun.liu@uc.edu) Thermal stress analysis of multi-layer thin films and coatings by an advanced BEM 2 X. L. Chen and Y. J. Liu University of Cincinnati 1
Implementation of Out-of-Core Cholesky and QR Factorizations with POOCLAPACK
, 2000
"... In this paper parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-of-core linear algebra operations. POOCLAPACK builds on the Parallel Linear A ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-of-core linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for in-core parallel dense linear algebra computation. Despite the extreme simplicity of POOCLAPACK, the out-of-core Cholesky factorization implementation is shown to achieve in excess of 80% of peak performance on a 64 node configuration of the Cray T3E-600. The insights gained from examining the Cholesky factorization have been applied to the much more difficult and important QR factorization operation. Preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.
unknown title
"... A fast multipole boundary element method for 2D multi-domain elastostatic problems based on a dual BIE formulation ..."
Abstract
- Add to MetaCart
A fast multipole boundary element method for 2D multi-domain elastostatic problems based on a dual BIE formulation
Supervising Professor: Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization
"... In this paper we describe two efficient parallel out-of-core implementations of the Cholesky factorization. We use the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our out-of-core algorithms. The first algorith ..."
Abstract
- Add to MetaCart
In this paper we describe two efficient parallel out-of-core implementations of the Cholesky factorization. We use the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our out-of-core algorithms. The first algorithm uses in-core kernels with additional code to manage the I/O. This is the classical approach to out-of-core implementations of the Cholesky factorization. Our second algorithm adds an out-of-core implementation of the triangular solve with multiple right hand sides, which doesn’t simply bring code in-core and run the in-core algorithm. This algorithm has the added benefit of requiring fewer copies of the matrix to be in-core at one time, thus allowing more of the matrix to be in-core at one time. Despite the extreme simplicity of POOCLAPACK and our out-of-core algorithm, the out-of-core Cholesky factorization implementation is shown to achieve in excess of 80 % of peak performance on a 64 node configuration of

