Results 1  10
of
105
Efficient spectralGalerkin methods III. Polar and cylindrical geometries
 SIAM J. Sci. Comput
, 1995
"... Abstract. Efficient direct solvers based on the ChebyshevGalerkin methods for second and fourth order equations are presented. They are based on appropriate base functions for the Galerkin formulation which lead to discrete systems with special structured matrices which can be efficiently inverted. ..."
Abstract

Cited by 81 (32 self)
 Add to MetaCart
(Show Context)
Abstract. Efficient direct solvers based on the ChebyshevGalerkin methods for second and fourth order equations are presented. They are based on appropriate base functions for the Galerkin formulation which lead to discrete systems with special structured matrices which can be efficiently inverted. Numerical results indicate that the direct solvers presented in this paper are significantly more accurate and efficient than that based on the Chebyshevtau method.
An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems
, 1997
"... We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov ..."
Abstract

Cited by 64 (11 self)
 Add to MetaCart
(Show Context)
We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov and Malyshev, but improves on them in several ways. This algorithm only uses easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition, but not matrix inversion. Similar parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which requires matrix inversion and is faster but can be less stable than the new algorithm.
Learning a spatially smooth subspace for face recognition
 Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on
, 2007
"... Subspace learning based face recognition methods have attracted considerable interests in recently years, including ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Subspace learning based face recognition methods have attracted considerable interests in recently years, including
A Parallel Fast Direct Solver For Block Tridiagonal Systems With Separable Matrices Of Arbitrary Dimension
 SIAM J. Sci. Comput
, 1996
"... A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the fivepoint finite difference scheme or the pi ..."
Abstract

Cited by 26 (15 self)
 Add to MetaCart
(Show Context)
A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the fivepoint finite difference scheme or the piecewise linear finite elements on a triangulated rectangular mesh. The Divide & Conquer method has the arithmetical complexity O(N log N ), and it is closely related to the cyclic reduction, but instead of using the matrix polynomial factorization the socalled partial solution technique is employed. The method is presented and analyzed in a general base q framework and based on this analysis, the base four variant is chosen for parallel implementation using the MPI standard. The generalization of the method to the case of arbitrary block dimension is described. The numerical experiments show the sequential efficiency and numerical stability of the considered method compared to the wellknown...
Stable and efficient spectral methods in unbounded domains using Laguerre functions
 SIAM JOURNAL ON NUMERICAL ANALYSIS
, 2000
"... Stable and efficient spectral methods using Laguerre functions are proposed and analyzed for model elliptic equations on regular unbounded domains. It is shown that spectralGalerkin approximations based on Laguerre functions are stable and convergent with spectral accuracy in the usual (not weighte ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
(Show Context)
Stable and efficient spectral methods using Laguerre functions are proposed and analyzed for model elliptic equations on regular unbounded domains. It is shown that spectralGalerkin approximations based on Laguerre functions are stable and convergent with spectral accuracy in the usual (not weighted) Sobolev spaces. Efficient, accurate, and wellconditioned algorithms using Laguerre functions are developed and implemented. Numerical results indicating the spectral convergence rate and effectiveness of these algorithms are presented.
Fast tridiagonal solvers on the GPU
 In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010
, 2010
"... We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU program ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31 % and 61 % respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multithreaded CPU solver.
A new fastmultipole accelerated poisson solver in two dimensions
 SIAM J. Sci. Comput
, 2001
"... Abstract. We present an adaptive fast multipole method for solving the Poisson equation in two dimensions. The algorithm is direct, assumes that the source distribution is discretized using an adaptive quadtree, and allows for Dirichlet, Neumann, periodic, and freespace conditions to be imposed on ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present an adaptive fast multipole method for solving the Poisson equation in two dimensions. The algorithm is direct, assumes that the source distribution is discretized using an adaptive quadtree, and allows for Dirichlet, Neumann, periodic, and freespace conditions to be imposed on the boundary of a square. The amount of work per grid point is comparable to that of classical fast solvers, even for highly nonuniform grids.
A direct adaptive Poisson solver of arbitrary order accuracy
 J. Comput. Phys
, 1996
"... We present a direct, adaptive solver for the Poisson equation which can achieve any prescribed order of accuracy. It is based on a domain decomposition approach using local spectral approximation, as well as potential theory and the fast multipole method. In two space dimensions, the algorithm requ ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
We present a direct, adaptive solver for the Poisson equation which can achieve any prescribed order of accuracy. It is based on a domain decomposition approach using local spectral approximation, as well as potential theory and the fast multipole method. In two space dimensions, the algorithm requires O(NK) work where N is the number of discretization points and K is the desired order of accuracy. 1
An Algorithm With Polylog Parallel Complexity for. . .
"... This paper describes an algorithm for the timeaccurate solution of certain classes of parabolic partial di erential e uations that can be parallelized in both time and space. It has a serial comple ity that is proportional to the serial comple ities of the best known algorithms. The algorithm is a v ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
This paper describes an algorithm for the timeaccurate solution of certain classes of parabolic partial di erential e uations that can be parallelized in both time and space. It has a serial comple ity that is proportional to the serial comple ities of the best known algorithms. The algorithm is a variant of the multigrid waveform rela ation method where the scalar ordinary di erential e uations that make up the kernel of computation are solved using a cyclic reduction type algorithm. E perimental results obtained on a massively parallel multiprocessor are presented. . parabolic partial di erential e uations, massively parallel computation, waveform rela ation, multigrid, cyclic reduction . primary 65M, 65W secondary 65L05 1. ntroduction. For many numerical problems in scientific computation, the execution time grows without bound as a function of the problem size, independent of the number of processors and of the algorithm used [36], [38], [40]. In particular, for most linear partial differential equations (PDEs) arising in mathematical physics, the parallel complexity grows as (log N ), where N is a particular measure of the problem size. The proof is based on deriving upper and lower bounds on the execution time of optimal parallel algorithms for multiprocessors with an unlimited number of processors and no interprocessor communication costs, where both upper and lower bounds are proportional to log N . These optimal parallel algorithms can have very large serial complexities, and the tightness of the bounds on the parallel execution time for practical algorithms is not established by this analysis. In the analysis of standard numerical algorithms for linear PDEs, there is a strong dichotomy in the nature of the growth in the parallel execution time between algorith...
Parallel Solution of Recurrence Problems
 IBM J. Res. Develop
, 1974
"... Abstract:. An mthorder recurrence problem is defined as the computation of the sequence x,;.., xN, where xi =f(ai, xi,;. and ai,is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence functionfh ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract:. An mthorder recurrence problem is defined as the computation of the sequence x,;.., xN, where xi =f(ai, xi,;. and ai,is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence functionfhas associated with it two other functions that satisfy certain composition properties, then we can construct elegant and efficient parallel algorithms that can compute all N elements of the series in time proportional to [log,N]. The class of problems having this property includes linear recurrences of all orders both homogeneous and inhomogeneous, recurrences involving matrix or binary quantities, and various nonlinear problems involving operations such as computation with matrix inverses, exponentiation, and modulo division.