Results 1  10
of
11
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by reordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (25% of total running time) and high performance benefits (reducing execution time by factors of 1.12.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursionbased control structures may be needed to fully exploit their potential.
Recursive Array Layouts and Fast Parallel Matrix Multiplication
 In Proceedings of Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures
, 1999
"... Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional columnmajor or rowmajor array layouts i ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional columnmajor or rowmajor array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.22.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms;...
Recursive Array Layouts and Fast Matrix Multiplication
, 1999
"... The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional columnmajor or rowmajor array layouts to incur high variability in memory system performance as matrix size var ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional columnmajor or rowmajor array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.22.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between ...
CacheEfficient Matrix Transposition
"... We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall runn ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall running time of the algorithms. We use various memory models to capture and analyze the effect of various facets of cache memory architecture that guide the choice of a particular algorithm, and attempt to experimentally validate the predictions of the model. Our major conclusions are as follows: limited associativity in the mapping from main memory addresses to cache sets can significantly degrade running time; the limited number of TLB entries can easily lead to thrashing; the fanciest optimal algorithms are not competitive on real machines even at fairly large problem sizes unless cache miss penalties are quite high; lowlevel performance tuning “hacks”, such as register tiling and array alignment, can significantly distort the effects of improved algorithms; and hierarchical nonlinear layouts are inherently superior to the standard canonical layouts (such as row or columnmajor) for
this problem.
Continuous Indexing of Hierarchical Subdivisions of the Globe
 International Journal of Geographical Information Science
, 2000
"... We describe how to create a continuous global index of the surface of the earth. The model is based on a hierarchical subdivision of the surface into triangular regions, in which eachregion is assigned a numerical label according to a spacefilling curve. Sequential labels are assigned to adjacent ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
We describe how to create a continuous global index of the surface of the earth. The model is based on a hierarchical subdivision of the surface into triangular regions, in which eachregion is assigned a numerical label according to a spacefilling curve. Sequential labels are assigned to adjacent regions, so labels can be sorted to create a continuous onedimensional index. Benefits of this continuous model include the implicit preservation of adjacency information, and the abilityto vary resolution at different locations. Previously suggested schemes based on similar models produce indices that are discontinuous.
VertexLabeling Algorithms for the Hilbert Spacefilling Curve
 Software– Practice and Experience
, 2000
"... We describe a method, based on vertexlabeling, to generate algorithms for manipulating the Hilbert spacefilling curve in the following ways: 1. Computing the image of a pointinR 1 . ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We describe a method, based on vertexlabeling, to generate algorithms for manipulating the Hilbert spacefilling curve in the following ways: 1. Computing the image of a pointinR 1 .
Discrete Global Grid System
, 1998
"... A new class of spatial data structures called discrete global grid systems (DGGS’s) is introduced and the general application classes for it are discussed. DGGS’s based on subdivisions of the platonic solids, called Geodesic DGGS’s, are then introduced. A number of existing and proposed Geodesic DGG ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A new class of spatial data structures called discrete global grid systems (DGGS’s) is introduced and the general application classes for it are discussed. DGGS’s based on subdivisions of the platonic solids, called Geodesic DGGS’s, are then introduced. A number of existing and proposed Geodesic DGGS’s are examined by looking at four design choices that must be made in constructing a Geodesic DGGS: the base platonic solid, the orientation of that solid relative to the earth’s surface, the method of subdivision defined on a face of that solid, and a method for relating that planar subdivision to the corresponding spherical surface. Finally, an examination of these design choices leads us to the construction of the ISEA3H DGGS.
Spatial Ordering and Encoding for Geographic Data Mining and Visualization
"... Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining to take into account both the geographic information and multiple aspatial variab ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a onedimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different geographical spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is then amenable for processing together with aspatial variables by any existing (nonspatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two spacefilling curves, six hierarchical clustering based methods, and a onedimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the completelinkage clustering consistently gives the best overall performance, surpassing wellknown spacefilling curves in preserving
A Window Retrieval Algorithm for Spatial Databases Using Quadtrees
 In Proc. 3rd Intl. Workshop on Advances in Geographic Information Systems (ACMGIS
, 1995
"... An algorithm is presented to answer window queries in a quadtreebased spatial database environment by retrieving the covering blocks in the underlying spatial database. It works by decomposing the window operation into suboperations over smaller window partitions. These partitions are the quadtree ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
An algorithm is presented to answer window queries in a quadtreebased spatial database environment by retrieving the covering blocks in the underlying spatial database. It works by decomposing the window operation into suboperations over smaller window partitions. These partitions are the quadtree blocks corresponding to the window. Although a block b in the underlying spatial database may cover several of the smaller window partitions, b is only retrieved once. As a result, the algorithm generates an optimal number of disk I/O requests to answer a window query (i.e., one request per covering block). The algorithm uses an auxiliary mainmemory data structure, called the active border, which requires additional storage of O(n), for a window query of size n \Theta n. An analysis of the algorithm's execution time and space requirements are given, as are some experimental results. 1 Introduction Because of the large volume of spatial databases, spatial access methods are usually used to...
Improving Raster Image RunLength Encoding Using Data Order
, 2001
"... We examine the technique of runlength encoding in combination with data order, where our attention is focused on good performance of image operations such as, e.g., rotation, reflection, and zooming. To this end we develop a new type of data order that supports these operations well and allows to p ..."
Abstract
 Add to MetaCart
We examine the technique of runlength encoding in combination with data order, where our attention is focused on good performance of image operations such as, e.g., rotation, reflection, and zooming. To this end we develop a new type of data order that supports these operations well and allows to perform them on a variant of a doublequeue automaton directly on the compressed data stream.