Results 1  10
of
70
Efficient spectralGalerkin methods III. Polar and cylindrical geometries
 SIAM J. Sci. Comput
, 1995
"... Abstract. Efficient direct solvers based on the ChebyshevGalerkin methods for second and fourth order equations are presented. They are based on appropriate base functions for the Galerkin formulation which lead to discrete systems with special structured matrices which can be efficiently inverted. ..."
Abstract

Cited by 72 (33 self)
 Add to MetaCart
Abstract. Efficient direct solvers based on the ChebyshevGalerkin methods for second and fourth order equations are presented. They are based on appropriate base functions for the Galerkin formulation which lead to discrete systems with special structured matrices which can be efficiently inverted. Numerical results indicate that the direct solvers presented in this paper are significantly more accurate and efficient than that based on the Chebyshevtau method.
Inverse free parallel spectral divide and conquer algorithms for nonsymmetric eigenproblems
 Numer. Math
, 1994
"... We discuss two inverse free, highly parallel, spectral divide and conquer algorithms: one for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right de ating subspaces of a regular matrix pencil A, B. These two closely related algorithms are based on ea ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
We discuss two inverse free, highly parallel, spectral divide and conquer algorithms: one for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right de ating subspaces of a regular matrix pencil A, B. These two closely related algorithms are based on earlier ones of Bulgakov, Godunov and Malyshev, but improve on them in several ways. These algorithms only use easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition. The existing parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which is faster but can be less stable than the new algorithm. Appears also as
Learning a spatially smooth subspace for face recognition
 Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on
, 2007
"... Subspace learning based face recognition methods have attracted considerable interests in recently years, including ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Subspace learning based face recognition methods have attracted considerable interests in recently years, including
A Parallel Fast Direct Solver For Block Tridiagonal Systems With Separable Matrices Of Arbitrary Dimension
 SIAM J. Sci. Comput
, 1996
"... A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the fivepoint finite difference scheme or the piecew ..."
Abstract

Cited by 24 (15 self)
 Add to MetaCart
A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the fivepoint finite difference scheme or the piecewise linear finite elements on a triangulated rectangular mesh. The Divide & Conquer method has the arithmetical complexity O(N log N ), and it is closely related to the cyclic reduction, but instead of using the matrix polynomial factorization the socalled partial solution technique is employed. The method is presented and analyzed in a general base q framework and based on this analysis, the base four variant is chosen for parallel implementation using the MPI standard. The generalization of the method to the case of arbitrary block dimension is described. The numerical experiments show the sequential efficiency and numerical stability of the considered method compared to the wellknown...
Stable and efficient spectral methods in unbounded domains using Laguerre functions
 SIAM JOURNAL ON NUMERICAL ANALYSIS
, 2000
"... Stable and efficient spectral methods using Laguerre functions are proposed and analyzed for model elliptic equations on regular unbounded domains. It is shown that spectralGalerkin approximations based on Laguerre functions are stable and convergent with spectral accuracy in the usual (not weighte ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Stable and efficient spectral methods using Laguerre functions are proposed and analyzed for model elliptic equations on regular unbounded domains. It is shown that spectralGalerkin approximations based on Laguerre functions are stable and convergent with spectral accuracy in the usual (not weighted) Sobolev spaces. Efficient, accurate, and wellconditioned algorithms using Laguerre functions are developed and implemented. Numerical results indicating the spectral convergence rate and effectiveness of these algorithms are presented.
Fast tridiagonal solvers on the GPU
 In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010
, 2010
"... We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU program ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31 % and 61 % respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multithreaded CPU solver.
Parallel Solution of Recurrence Problems
 IBM J. Res. Develop
, 1974
"... Abstract:. An mthorder recurrence problem is defined as the computation of the sequence x,;.., xN, where xi =f(ai, xi,;. and ai,is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence functionfh ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Abstract:. An mthorder recurrence problem is defined as the computation of the sequence x,;.., xN, where xi =f(ai, xi,;. and ai,is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence functionfhas associated with it two other functions that satisfy certain composition properties, then we can construct elegant and efficient parallel algorithms that can compute all N elements of the series in time proportional to [log,N]. The class of problems having this property includes linear recurrences of all orders both homogeneous and inhomogeneous, recurrences involving matrix or binary quantities, and various nonlinear problems involving operations such as computation with matrix inverses, exponentiation, and modulo division.
A new fastmultipole accelerated poisson solver in two dimensions
 SIAM J. Sci. Comput
, 2001
"... Abstract. We present an adaptive fast multipole method for solving the Poisson equation in two dimensions. The algorithm is direct, assumes that the source distribution is discretized using an adaptive quadtree, and allows for Dirichlet, Neumann, periodic, and freespace conditions to be imposed on ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. We present an adaptive fast multipole method for solving the Poisson equation in two dimensions. The algorithm is direct, assumes that the source distribution is discretized using an adaptive quadtree, and allows for Dirichlet, Neumann, periodic, and freespace conditions to be imposed on the boundary of a square. The amount of work per grid point is comparable to that of classical fast solvers, even for highly nonuniform grids.
Preconditioned HSS methods for the solution of nonHermitian positive definite linear systems
, 2002
"... We study the role of preconditioning strategies recently developed for coercive problems in connection with a twostep iterative method (HSS) proposed by Bai, Golub and Ng for the solution of nonsymmetric linear systems whose real part is coercive. As a model problem we consider Finite Dierences ( ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
We study the role of preconditioning strategies recently developed for coercive problems in connection with a twostep iterative method (HSS) proposed by Bai, Golub and Ng for the solution of nonsymmetric linear systems whose real part is coercive. As a model problem we consider Finite Dierences (FD) matrix sequences fAn(a; p)gn discretizing the elliptic (convectiondiusion) problem Aa;pu r[a(x)ru(x)] +r[p(x)u(x)] = f(x); x Dirichlet BC; (1) with with a(x) being a uniformly positive function and p(x) denoting the Reynolds function. More precisely, in connection with preconditioned HSS/GMRES like methods, we consider the preconditioning sequence fPn (a)gn , Pn(a) := D n (a)An(1; 0)D n (a) where Dn (a) is the suitable scaled main diagonal of An (a; 0). If a(x) is positive and regular enough, then the preconditioned sequence shows a strong clustering at the unity so that the sequence fPn (a)gn turns out to be a superlinear preconditioning sequence for fAn(a; 0)gn where An(a; 0) represents a good approximation of Re(An(a; p)) namely the real part of An(a; p).
An Algorithm With Polylog Parallel Complexity for. . .
"... This paper describes an algorithm for the timeaccurate solution of certain classes of parabolic partial di erential e uations that can be parallelized in both time and space. It has a serial comple ity that is proportional to the serial comple ities of the best known algorithms. The algorithm is a v ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
This paper describes an algorithm for the timeaccurate solution of certain classes of parabolic partial di erential e uations that can be parallelized in both time and space. It has a serial comple ity that is proportional to the serial comple ities of the best known algorithms. The algorithm is a variant of the multigrid waveform rela ation method where the scalar ordinary di erential e uations that make up the kernel of computation are solved using a cyclic reduction type algorithm. E perimental results obtained on a massively parallel multiprocessor are presented. . parabolic partial di erential e uations, massively parallel computation, waveform rela ation, multigrid, cyclic reduction . primary 65M, 65W secondary 65L05 1. ntroduction. For many numerical problems in scientific computation, the execution time grows without bound as a function of the problem size, independent of the number of processors and of the algorithm used [36], [38], [40]. In particular, for most linear partial differential equations (PDEs) arising in mathematical physics, the parallel complexity grows as (log N ), where N is a particular measure of the problem size. The proof is based on deriving upper and lower bounds on the execution time of optimal parallel algorithms for multiprocessors with an unlimited number of processors and no interprocessor communication costs, where both upper and lower bounds are proportional to log N . These optimal parallel algorithms can have very large serial complexities, and the tightness of the bounds on the parallel execution time for practical algorithms is not established by this analysis. In the analysis of standard numerical algorithms for linear PDEs, there is a strong dichotomy in the nature of the growth in the parallel execution time between algorith...