Results 11  20
of
78
Scanning and traversing: maintaining data for traversals in a memory hierarchy
 In Proceedings of the 10th Annual European Symposium on Algorithms
, 2002
"... Abstract. We study the problem of maintaining a dynamic ordered set subject to insertions, deletions, and traversals of k consecutive elements. This problem is trivially solved on a RAM and on a simple twolevel memory hierarchy. We explore this traversal problem on more realistic memory models: the ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Abstract. We study the problem of maintaining a dynamic ordered set subject to insertions, deletions, and traversals of k consecutive elements. This problem is trivially solved on a RAM and on a simple twolevel memory hierarchy. We explore this traversal problem on more realistic memory models: the cacheoblivious model, which applies to unknown and multilevel memory hierarchies, and sequentialaccess models, where sequential block transfers are less expensive than random block transfers. 1
ArchitectureCognizant Divide and Conquer Algorithms
, 1999
"... Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionall ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionallyequivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires (v h ) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to (h c ) (where c is typically a small constant) experiments for a class of architecturecognizant programs. We verify our technique on two kernels (matrix multiply and 2D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared...
Cache Performance Analysis of Traversals and Random Accesses
 In Proceedings of the Tenth Annual ACMSIAM Symposium on Discrete Algorithms
, 1999
"... This paper describes a model for studying the cache performance of algorithms in a directmapped cache. Using this model, we analyze the cache performance of several commonly occurring memory access patterns: (i) sequential and random memory traversals, (ii) systems of random accesses, and (iii) com ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
This paper describes a model for studying the cache performance of algorithms in a directmapped cache. Using this model, we analyze the cache performance of several commonly occurring memory access patterns: (i) sequential and random memory traversals, (ii) systems of random accesses, and (iii) combinations of each. For each of these, we give exact expressions for the number of cache misses per memory access in our model. We illustrate the application of these analyses by determining the cache performance of two algorithms: the traversal of a binary search tree and the counting of items in a large array. Trace driven cache simulations validate that our analyses accurately predict cache performance. 1 Introduction The concrete analysis of algorithms has a long and rich history. It has played an important role in understanding the performance of algorithms in practice. Traditional concrete analysis of algorithms is interested in approximating as closely as possible the number of "cost...
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation
, 2002
"... An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit p ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit pointers perform best overall, but cache oblivious algorithms do almost as well and do not have to be tuned to the memory block size as cache aware algorithms require. Program instrumentation techniques are used to compare the cache misses and instruction counts for implementations of these algorithms.
HighPerformance Algorithm Engineering for Computational Phylogenetics
 J. Supercomputing
, 2002
"... A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool in biological, pharmaceutical, and medical research. Phylogeny reconstruction from molecular data is very difficult: almost all optimization models give rise to NPhard (and thus computationally intractable) problems. Yet approximations must be of very high quality in order to avoid outright biological nonsense. Thus many biologists have been willing to run farms of processors for many months in order to analyze just one dataset. Highperformance algorithm engineering offers a battery of tools that can reduce, sometimes spectacularly, the running time of existing phylogenetic algorithms, as well as help designers produce better algorithms. We present an overview of algorithm engineering techniques, illustrating them with an application to the "breakpoint analysis" method of Sankoff et al., which resulted in the GRAPPA software suite. GRAPPA demonstrated a speedup in running time by over eight orders of magnitude over the original implementation on a variety of real and simulated datasets. We show how these algorithmic engineering techniques are directly applicable to a large variety of challenging combinatorial problems in computational biology.
Designing Practical Efficient Algorithms for Symmetric Multiprocessors (Extended Abstract)
 IN ALGORITHM ENGINEERING AND EXPERIMENTATION (ALENEXâ€™99
, 1999
"... Symmetric multiprocessors (SMPs) dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a comput ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Symmetric multiprocessors (SMPs) dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems  linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probabi...
Accessing Multiple Sequences Through Set Associative Caches
 In Proc
, 1999
"... The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms which perform well in practice. We start from the empirical observation that external memory algorithms often turn out to be good algorithms for cached memory. This is n ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms which perform well in practice. We start from the empirical observation that external memory algorithms often turn out to be good algorithms for cached memory. This is not self evident since caches have a fixed and quite restrictive algorithm choosing the content of the cache. We investigate the impact of this restriction for the frequently occurring case of access to multiple sequences. We show that any access pattern to k = \Theta(M=B ) sequential data streams can be efficiently supported on an away set associative cache with capacity M and line size B. The bounds are tight up to lower order terms.
A cacheefficient sorting algorithm for database and data mining computations using graphics processors
, 2005
"... We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also present strategies to exploit the tilebased computational model of GPUs. Our new algorithm has a memoryefficient data access pattern and we describe an efficient instruction dispatch mechanism to improve the overall sorting performance. We have used our sorting algorithm to accelerate joinbased queries and stream mining algorithms. Our results indicate up to an order of magnitude improvement over prior CPUbased and GPUbased sorting algorithms. 1
Efficient Sorting Using Registers and Caches
 in Proceedings of the 4th Workshop on Algorithm Engineering (WAE 2000
, 2000
"... Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features... ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features...