Results 1  10
of
51
Compiletime Composition of Runtime Data and Iteration Reorderings
, 2003
"... Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compiletime. Prior work has developed runtime reorderings of data and computation that enhance locality in such applications. ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compiletime. Prior work has developed runtime reorderings of data and computation that enhance locality in such applications.
Cacheefficient multigrid algorithms
 Int. J. High Perform. Comput. Appl
"... Abstract. Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods like GaussSeidel and RedBlack GaussSeidel form the principal computational component of multigrid, and thus affect its ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract. Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods like GaussSeidel and RedBlack GaussSeidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2–8). We exploit this property of the algorithm to develop a cacheefficient multigrid, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cacheefficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values registerresident. Experiments on five modern computing platforms show a performance improvement of 1.15–2.7 times over a standard implementation of Full Multigrid VCycle. 1
An Overview of Cache Optimization Techniques and CacheAware Numerical Algorithms
 Algorithms for Memory Hierarchies — Advanced Lectures, volume 2625 of Lecture Notes in Computer Science
, 2003
"... this paper focuses on optimization techniques for enhancing cache performance ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
this paper focuses on optimization techniques for enhancing cache performance
Sketching Stencils
"... Performance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations are hard to obtain because they often involve nontraditional transformations, whi ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Performance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations are hard to obtain because they often involve nontraditional transformations, which means that they cannot be produced by optimizing the reference stencil with a compiler. In fact, many stencils are produced by code generators that were tediously handcrafted. In this paper, we show how stencil implementations can be produced with sketching. Sketching is a software synthesis approach where the programmer develops a partial implementation— a sketch—and a separate specification of the desired functionality given by a reference (unoptimized) stencil. The synthesizer then completes the sketch to behave like the specification, filling in code fragments that are difficult to develop manually. Existing sketching systems work only for small finite programs, i.e., programs that can be represented as small Boolean circuits. In this paper, we develop a sketching synthesizer that works for stencil computations, a large class of programs that, unlike circuits, have unbounded inputs and outputs, as well as an unbounded number of computations. The key contribution is a reduction algorithm that turns a stencil into a circuit, allowing us to synthesize stencils using an existing sketching synthesizer.
Streaming Multigrid for GradientDomain Operations on Large Images
"... We introduce a new tool to solve the large linear systems arising from gradientdomain image processing. Specifically, we develop a streaming multigrid solver, which needs just two sequential passes over outofcore data. This fast solution is enabled by a combination of three techniques: (1) use of ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We introduce a new tool to solve the large linear systems arising from gradientdomain image processing. Specifically, we develop a streaming multigrid solver, which needs just two sequential passes over outofcore data. This fast solution is enabled by a combination of three techniques: (1) use of secondorder finite elements (rather than traditional finite differences) to reach sufficient accuracy in a single Vcycle, (2) temporally blocked relaxation, and (3) multilevel streaming to pipeline the restriction and prolongation phases into single streaming passes. A key contribution is the extension of the Bspline finiteelement method to be compatible with the forwarddifference gradient representation commonly used with images. Our streaming solver is also efficient for inmemory images, due to its fast convergence and excellent cache behavior. Remarkably, it can outperform spatially adaptive solvers that exploit applicationspecific knowledge. We demonstrate seamless stitching and tonemapping of gigapixel images in about an hour on a notebook PC. Keywords: outofcore multigrid solver, Bspline finite elements, Poisson equation, gigapixel images, multilevel streaming. 1
Memory Characteristics of Iterative Methods
, 1999
"... Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that th ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the current DRAM technology cannot provide the data fast enough to keep the CPU busy. Although the fundamentals of cache optimizations are quite simple, current compilers cannot optimize even elementary iterative schemes. In this paper, we analyze the memory and cache behavior of iterative methods with extensive profiling and describe program transformation techniques to improve the cache performance of two and threedimensional multigrid algorithms. 1 Introduction Multigrid methods [11, 5] are among the most attractive algorithms for the solution of large sparse systems of equations that arise in the solution of elliptic partial differential equations (PDEs). However, even simple multi...
Combining Performance Aspects of Irregular GaussSeidel via Sparse Tiling
 in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2002
"... Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as GaussSeidel. To improve performance, iterative smoothers can exploit parallelism, intraiteration data reuse, and interiteration data reuse. Current met ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as GaussSeidel. To improve performance, iterative smoothers can exploit parallelism, intraiteration data reuse, and interiteration data reuse. Current methods for parallelizing GaussSeidel on irregular grids, such as multicoloring and ownercomputes based techniques, exploit parallelism and possibly intraiteration data reuse but not interiteration data reuse. Sparse tiling techniques were developed to improve intraiteration and interiteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of GaussSeidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to ownercomputes based GaussSeidel methods. The latter employ only parallelism and intraiteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intraiteration, and interiteration data locality) are combined.
Optimization and profiling of the cache performance of parallel lattice Boltzmann codes in 2D and 3D
 PARALLEL PROCESSING LETTERS
, 2003
"... When designing and implementing highly efficient scientific applications for parallel computers such as clusters of workstations, it is inevitable to consider and to optimize the singleCPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchic ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
When designing and implementing highly efficient scientific applications for parallel computers such as clusters of workstations, it is inevitable to consider and to optimize the singleCPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the effects of the growing gap between CPU performance and main memory speed. In this article, we present techniques to enhance the singleCPU efficiency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results for both 2D and 3D codes in order to emphasize the effectiveness of our optimization techniques.
Adaptive hybrid FEM/FDM methods for inverse scattering problems
 Department of Mathematics; Chalmers University of Technology & Goteborg University
, 2002
"... This thesis is devoted to adaptive hybrid finite element / finite difference methods for an inverse scattering problem for the timedependent acoustic wave equation in 2D and 3D, where we seek to reconstruct an unknown sound velocity c(x) from measured wavereflection data. ..."
Abstract

Cited by 18 (13 self)
 Add to MetaCart
This thesis is devoted to adaptive hybrid finite element / finite difference methods for an inverse scattering problem for the timedependent acoustic wave equation in 2D and 3D, where we seek to reconstruct an unknown sound velocity c(x) from measured wavereflection data.