Results 1  10
of
13
The design and implementation of the parallel outofcore scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS97247
, 1997
"... This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the outofcore extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full mat ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the outofcore extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer submatrice panels into memory. The ‘leftlooking ’ columnoriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as incore computational kernels. We present the details of the implementation for the outofcore ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf linux cluster.
A fast multipole boundary element method for 2D multidomain elastostatic problems based on a dual BIE formulation
, 2008
"... ..."
Fast multipole method for the biharmonic equation in three dimensions
 J. Comput. Phys
, 2006
"... The evaluation of sums (matrixvector products) of the solutions of the threedimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown tha ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
The evaluation of sums (matrixvector products) of the solutions of the threedimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown that translations of elementary solutions of the biharmonic equation can be achieved by considering the translation of a pair of elementary solutions of the Laplace equations. The extension of the theory to the case of polyharmonic equations in R 3 is also discussed. An efficient way of performing the FMM for biharmonic equations using the solution of a complex valued FMM for the Laplace equation is presented. Compared to previous methods presented for the biharmonic equation our method appears more efficient. The theory is implemented and numerical tests presented that demonstrate the performance of the method for varying problem sizes and accuracy requirements. In our implementation, the FMM for the biharmonic equation is faster than direct matrix vector product for a matrix size of 550 for a relative L2 accuracy 2 =10 −4, and N = 3550 for 2 =10 −12. 1
Efficient Parallel OutofCore Implementation of the Cholesky Factorization
, 1999
"... In this paper we describe two efficient parallel outofcore implementations of the Cholesky factorization. We use the Parallel OutofCore Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our outofcore algorithms. The first algorith ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In this paper we describe two efficient parallel outofcore implementations of the Cholesky factorization. We use the Parallel OutofCore Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our outofcore algorithms. The first algorithm uses incore kernels with additional code to manage the I/O. This is the classical approach to outofcore implementations of the Cholesky factorization. Our second algorithm adds an outofcore implementation of the triangular solve with multiple right hand sides, which doesn't simply bring code incore and run the incore algorithm. This algorithm has the added benefit of requiring fewer copies of the matrix to be incore at one time, thus allowing more of the matrix to be incore at one time. Despite the extreme simplicity of POOCLAPACK and our outofcore algorithm, the outofcore Cholesky factorization implementation is shown to achieve in excess of 80% of peak performance on a 64 node configuration of the Cray T3E600.
Modelbased Control of Adaptive Applications: An Overview
 in Proceedings of the Workshop on Next Generation Systems, International Parallel and Distributed Processing Symposium (IPDPS’02
, 2002
"... Modelbased control utilizes performance models of applications to choose performant system configurations for execution of applications. The performance models used in this research separate specification of the software system from specification of the execution environment so that modelbased con ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Modelbased control utilizes performance models of applications to choose performant system configurations for execution of applications. The performance models used in this research separate specification of the software system from specification of the execution environment so that modelbased control can select software configurations for a given execution environment or, conversely, select execution environments for a given software configuration. Some representations, methods, and tools that enable modelbased control are briefly described. Application of some of the methods and tools to a stochastic optimization code is briefly sketched. The problems of modelbased control of two other applications are defined and described. The relationship between modelbased control and other methods for adaptive control is briefly discussed. 1. ModelBased Control Current procedures for selection of system configurations for execution of complex, parallel programs on complex, distributed execution environments are mostly ad hoc. Anecdotal evidence suggests that system configurations obtained from current ad hoc procedures are often suboptimal. System configuration management is rendered even more difficult when the application is adaptive or when the execution environment changes. Modelbased control is a methodology for generating effective system configurations that are based upon predictions obtained from performance models of the application and of its execution environment. Modelbased control begins with the wellestablished but infrequently applied practice of using a performance model of the application and the initial execution environment to determine an effective initial system configuration. Performance models can be assembled manually and the analysis leading to effective configurations can incorporate human intelligence.
Continuum models of carbon nanotubebased composites by the BEM, Electronic Journal of Boundary Elements
 J. Bound. Elements
, 2003
"... This paper presents some recent advances in the boundary element method (BEM) for the analysis of carbon nanotube (CNT)based composites. Carbon nanotubes, formed conceptually by rolling thin graphite sheets, have been found to be extremely stiff, strong and resilient, and therefore may be ideal for ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents some recent advances in the boundary element method (BEM) for the analysis of carbon nanotube (CNT)based composites. Carbon nanotubes, formed conceptually by rolling thin graphite sheets, have been found to be extremely stiff, strong and resilient, and therefore may be ideal for reinforcing composite materials. However, the thin cylindrical shape of the CNTs presents great challenges to any computational method when these thin shelllike CNTs are embedded in a matrix material. The BEM, based on exactly the same boundary integral equation (BIE) formulation developed by Rizzo some forty years ago, turns out to be an ideal numerical tool for such simulations using continuum mechanics. Modeling issues regarding model selections, representative volume elements, interface conditions and others, will be discussed in this paper. Methods for dealing with nearlysingular integrals which arise in the BEM analysis of thin or layered materials and are crucial for the accuracy of such analyses will be reviewed. Numerical examples using the BEM and compared with the finite element method (FEM) will be presented to demonstrate the efficiency and accuracy of the BEM in analyzing the CNTreinforced composites. 1.
Thermal Stress Analysis of MultiLayer Thin Films and Coatings By an Advanced Boundary Element Method
, 2000
"... An advanced boundary element method (BEM) is developed in this paper for analyzing thin layered structures, such as thin films and coatings, under the thermal loading. The boundary integral equation (BIE) formulation for steadystate thermoelasticity is reviewed and a special case, that is, the BIE ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
An advanced boundary element method (BEM) is developed in this paper for analyzing thin layered structures, such as thin films and coatings, under the thermal loading. The boundary integral equation (BIE) formulation for steadystate thermoelasticity is reviewed and a special case, that is, the BIE for a uniform distribution of the temperature change, is presented. The new nearlysingular integrals arising from the applications of the BIE/BEM to thin layered structures under thermal loading are treated in the same way as developed earlier for thin structures under the mechanical loading. Three 2D test problems involving layered thin films and coatings on an elastic body are studied using the developed thermal BEM and a commercial FEM software. Numerical results for displacements and interfacial stresses demonstrate that the developed BIE/BEM remains to be very accurate, efficient in modeling, and surprisingly stable, for thin elastic materials with the thicknesstolength ratios down to 10 9 (the nanoscale). This thermal BEM capability can be employed to investigate other more important and realistic thin film and coating problems, such as residual stresses, interfacial crack initiation and propagation (peelingoff) , in electronic packaging or other engineering applications. Correspondence to: Yijun Liu (Email: yijun.liu@uc.edu) Thermal stress analysis of multilayer thin films and coatings by an advanced BEM 2 X. L. Chen and Y. J. Liu University of Cincinnati 1
Adaptive Modeling of Composite Structures: Modeling Error Estimation
 Texas Institute for Computational and Applied Mathematics
, 1999
"... The accurate simulation of the behavior of composite materials depends on many factors such as the character, size, topology, and mechanical properties of the microstructure, the denition of the domain of interest and the loads as well as the specication of accuracy desired and the goal of the simul ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The accurate simulation of the behavior of composite materials depends on many factors such as the character, size, topology, and mechanical properties of the microstructure, the denition of the domain of interest and the loads as well as the specication of accuracy desired and the goal of the simulation. There is, therefore, a need to develop a systematic technique to adaptively select the most appropriate scale that governs specic features of the response that are of interest. The concept of hierarchical modeling provides such a framework. In this approach, the accuracy of a given mathematical model, compared to a model of ner scale, is evaluated with the use of a posteriori estimates of the \modeling error" and these form the basis of an adaptive procedure. This investigation focuses on the analysis of the equilibrium of linearly elastic heterogeneous bodies characterized by highly oscillatory elastic coecients. The control of modeling error in such systems was studied in earlie...
Implementation of OutofCore Cholesky and QR Factorizations with POOCLAPACK
, 2000
"... In this paper parallel implementation of outofcore Cholesky factorization is used to introduce the Parallel OutofCore Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of outofcore linear algebra operations. POOCLAPACK builds on the Parallel Linear A ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper parallel implementation of outofcore Cholesky factorization is used to introduce the Parallel OutofCore Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of outofcore linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for incore parallel dense linear algebra computation. Despite the extreme simplicity of POOCLAPACK, the outofcore Cholesky factorization implementation is shown to achieve in excess of 80% of peak performance on a 64 node configuration of the Cray T3E600. The insights gained from examining the Cholesky factorization have been applied to the much more difficult and important QR factorization operation. Preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.
Supervising Professor: Efficient Parallel OutofCore Implementation of the Cholesky Factorization
"... In this paper we describe two efficient parallel outofcore implementations of the Cholesky factorization. We use the Parallel OutofCore Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our outofcore algorithms. The first algorith ..."
Abstract
 Add to MetaCart
In this paper we describe two efficient parallel outofcore implementations of the Cholesky factorization. We use the Parallel OutofCore Linear Algebra Package (POOCLAPACK) as an extension to the Parallel Linear Algebra Package (PLAPACK) to implement our outofcore algorithms. The first algorithm uses incore kernels with additional code to manage the I/O. This is the classical approach to outofcore implementations of the Cholesky factorization. Our second algorithm adds an outofcore implementation of the triangular solve with multiple right hand sides, which doesn’t simply bring code incore and run the incore algorithm. This algorithm has the added benefit of requiring fewer copies of the matrix to be incore at one time, thus allowing more of the matrix to be incore at one time. Despite the extreme simplicity of POOCLAPACK and our outofcore algorithm, the outofcore Cholesky factorization implementation is shown to achieve in excess of 80 % of peak performance on a 64 node configuration of