Results 1 - 10
of
10
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by re-ordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (2--5% of total running time) and high performance benefits (reducing execution time by factors of 1.1-2.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursion-based control structures may be needed to fully exploit their potential.
Recursive Array Layouts and Fast Parallel Matrix Multiplication
- In Proceedings of Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures
, 1999
"... Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional column-major or row-major array layouts i ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional column-major or row-major array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms;...
Recursive Array Layouts and Fast Matrix Multiplication
, 1999
"... The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size var ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between ...
Cache-Efficient Matrix Transposition
"... We investigate the memory system performance of several algorithms for transposing an N N matrix in-place, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall runn ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
We investigate the memory system performance of several algorithms for transposing an N N matrix in-place, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall running time of the algorithms. We use various memory models to capture and analyze the effect of various facets of cache memory architecture that guide the choice of a particular algorithm, and attempt to experimentally validate the predictions of the model. Our major conclusions are as follows: limited associativity in the mapping from main memory addresses to cache sets can significantly degrade running time; the limited number of TLB entries can easily lead to thrashing; the fanciest optimal algorithms are not competitive on real machines even at fairly large problem sizes unless cache miss penalties are quite high; low-level performance tuning “hacks”, such as register tiling and array alignment, can significantly distort the effects of improved algorithms; and hierarchical nonlinear layouts are inherently superior to the standard canonical layouts (such as row- or column-major) for
this problem.
Continuous Indexing of Hierarchical Subdivisions of the Globe
- International Journal of Geographical Information Science
, 2000
"... We describe how to create a continuous global index of the surface of the earth. The model is based on a hierarchical subdivision of the surface into triangular regions, in which eachregion is assigned a numerical label according to a spacefilling curve. Sequential labels are assigned to adjacent ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We describe how to create a continuous global index of the surface of the earth. The model is based on a hierarchical subdivision of the surface into triangular regions, in which eachregion is assigned a numerical label according to a spacefilling curve. Sequential labels are assigned to adjacent regions, so labels can be sorted to create a continuous one-dimensional index. Benefits of this continuous model include the implicit preservation of adjacency information, and the abilityto vary resolution at different locations. Previously suggested schemes based on similar models produce indices that are discontinuous.
Vertex-Labeling Algorithms for the Hilbert Spacefilling Curve
- Software– Practice and Experience
, 2000
"... We describe a method, based on vertex-labeling, to generate algorithms for manipulating the Hilbert spacefilling curve in the following ways: 1. Computing the image of a pointinR 1 . ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We describe a method, based on vertex-labeling, to generate algorithms for manipulating the Hilbert spacefilling curve in the following ways: 1. Computing the image of a pointinR 1 .
Discrete Global Grid System
, 1998
"... A new class of spatial data structures called discrete global grid systems (DGGS’s) is introduced and the general application classes for it are discussed. DGGS’s based on subdivisions of the platonic solids, called Geodesic DGGS’s, are then introduced. A number of existing and proposed Geodesic DGG ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A new class of spatial data structures called discrete global grid systems (DGGS’s) is introduced and the general application classes for it are discussed. DGGS’s based on subdivisions of the platonic solids, called Geodesic DGGS’s, are then introduced. A number of existing and proposed Geodesic DGGS’s are examined by looking at four design choices that must be made in constructing a Geodesic DGGS: the base platonic solid, the orientation of that solid relative to the earth’s surface, the method of subdivision defined on a face of that solid, and a method for relating that planar subdivision to the corresponding spherical surface. Finally, an examination of these design choices leads us to the construction of the ISEA3H DGGS.
A Window Retrieval Algorithm for Spatial Databases Using Quadtrees
- In Proc. 3rd Intl. Workshop on Advances in Geographic Information Systems (ACM-GIS
, 1995
"... An algorithm is presented to answer window queries in a quadtree-based spatial database environment by retrieving the covering blocks in the underlying spatial database. It works by decomposing the window operation into suboperations over smaller window partitions. These partitions are the quadtree ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An algorithm is presented to answer window queries in a quadtree-based spatial database environment by retrieving the covering blocks in the underlying spatial database. It works by decomposing the window operation into suboperations over smaller window partitions. These partitions are the quadtree blocks corresponding to the window. Although a block b in the underlying spatial database may cover several of the smaller window partitions, b is only retrieved once. As a result, the algorithm generates an optimal number of disk I/O requests to answer a window query (i.e., one request per covering block). The algorithm uses an auxiliary main-memory data structure, called the active border, which requires additional storage of O(n), for a window query of size n \Theta n. An analysis of the algorithm's execution time and space requirements are given, as are some experimental results. 1 Introduction Because of the large volume of spatial databases, spatial access methods are usually used to...
Improving Raster Image Run-Length Encoding Using Data Order
, 2001
"... We examine the technique of run-length encoding in combination with data order, where our attention is focused on good performance of image operations such as, e.g., rotation, reflection, and zooming. To this end we develop a new type of data order that supports these operations well and allows to p ..."
Abstract
- Add to MetaCart
We examine the technique of run-length encoding in combination with data order, where our attention is focused on good performance of image operations such as, e.g., rotation, reflection, and zooming. To this end we develop a new type of data order that supports these operations well and allows to perform them on a variant of a double-queue automaton directly on the compressed data stream.
Management of Gridded Climate Data for National Scale Integrated Assessment Models
"... Computer model analyses of climate change impacts are data intensive due to the spatial and temporal dimensions over which climate operates. Data intensity proves a major constraint in the design of such climate models. For policy oriented climate models this constraint proves critical, given the lo ..."
Abstract
- Add to MetaCart
Computer model analyses of climate change impacts are data intensive due to the spatial and temporal dimensions over which climate operates. Data intensity proves a major constraint in the design of such climate models. For policy oriented climate models this constraint proves critical, given the lower specification computer hardware readily available to decision makers. This paper discusses the use of spatial data orderings in combination with run-length encoding to spatially compress climate data. Experiments have been conducted which test the application of various data ordering schemes to the storage of climate data for

