Results 1  10
of
20
The cost of cacheoblivious searching
 IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the diskaccess machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2level) DAM model and cacheoblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal klevel DAM search structure and the optimal cacheoblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cacheoblivious structure almost replicates the performance of an optimal parameterized klevel DAM structure.
A computational study of externalmemory BFS algorithms
 In SODA
, 2006
"... Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent exte ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent externalmemory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and diskparallelism, we were able to compute the BFS level decomposition of a webcrawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple externalmemory algorithms perform significantly better (minutes as compared to hours) than internalmemory BFS, even if more than half of the input resides internally. 1
Subquadratic algorithms for 3SUM
 In Proc. 9th Worksh. Algorithms & Data Structures, LNCS 3608
, 2005
"... We obtain subquadratic algorithms for 3SUM on integers and rationals in several models. On a standard word RAM with wbit words, we obtain a running time of O(n 2 / max { w lg 2 w, lg 2 n (lg lg n) 2}). In the circuit RAM with one nonstandard AC0 operation, we obtain O(n2 / w2 lg2). In external w me ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We obtain subquadratic algorithms for 3SUM on integers and rationals in several models. On a standard word RAM with wbit words, we obtain a running time of O(n 2 / max { w lg 2 w, lg 2 n (lg lg n) 2}). In the circuit RAM with one nonstandard AC0 operation, we obtain O(n2 / w2 lg2). In external w memory, we achieve O(n2 /(MB)), even under the standard assumption of data indivisibility. Cacheobliviously, we obtain a running time of O(n2 / MB lg2). In all cases, our speedup is almost M quadratic in the parallelism the model can afford, which may be the best possible. Our algorithms are Las Vegas randomized; time bounds hold in expectation, and in most cases, with high probability. 1
Cacheaware and cacheoblivious adaptive sorting
 In Proc. 32nd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science
, 2005
"... Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (nonadaptive) sorting. The second algorithm is based on a new division pr ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (nonadaptive) sorting. The second algorithm is based on a new division protocol for the GenericSort algorithm by EstivillCastro and Wood. From both algorithms we derive I/Ooptimal cacheaware and cacheoblivious adaptive sorting algorithms. These are the first I/Ooptimal adaptive sorting algorithms. 1
Pointerless Implementation of Hierarchical Simplicial Meshes and Efficient Neighbor Finding in Arbitrary Dimensions
 In Proc. International Meshing Roundtable (IMR 2004
, 2004
"... We describe a pointerless representation of hierarchical regular simplicial meshes, based on a bisection approach proposed by Maubach. We introduce a new labeling scheme, called an LPT code, that uniquely encodes each simplex of the hierarchy. We present rules to efficiently compute the neighbors ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We describe a pointerless representation of hierarchical regular simplicial meshes, based on a bisection approach proposed by Maubach. We introduce a new labeling scheme, called an LPT code, that uniquely encodes each simplex of the hierarchy. We present rules to efficiently compute the neighbors of a given simplex through the use of these codes. In addition, we show how to traverse the associated tree and how to answer point location and interpolation queries.Our system works in arbitrary dimensions.
Basic Network Creation Games
, 2010
"... We study a natural network creation game, in which each node locally tries to minimize its local diameter or its local average distance to other nodes, by swapping one incident edge at a time. The central question is what structure the resulting equilibrium graphs have, in particular, how well they ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We study a natural network creation game, in which each node locally tries to minimize its local diameter or its local average distance to other nodes, by swapping one incident edge at a time. The central question is what structure the resulting equilibrium graphs have, in particular, how well they globally minimize diameter. For the localaveragedistance version, we prove an upper bound of 2 O( √ lg n), a lower bound of 3, a tight bound of exactly 2 for trees, and give evidence of a general polylogarithmic upper bound. For the localdiameter version, we prove a lower bound of Ω ( √ n), and a tight upper bound of 3 for trees. All of our upper bounds apply equally well to previously extensively studied network creation games, both in terms of the diameter metric described above and the previously studied price of anarchy (which are related by constant factors). In surprising contrast, our model has no parameter α for the link creation cost, so our results automatically apply for all values of α without additional effort; furthermore, equilibrium can be checked in polynomial time in our model, unlike previous models. Our perspective enables simpler and more general proofs that get at the heart of network creation games.
CacheOblivious Databases: Limitations and Opportunities
, 2008
"... Cacheoblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These propert ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Cacheoblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These properties are highly attractive to autonomous databases, especially because the hardware architectures are becoming increasingly complex and diverse. In this paper, we present our design, implementation, and evaluation of the first cacheoblivious inmemory query processor, EaseDB. Moreover, we discuss the inherent limitations of the cacheoblivious approach as well as the opportunities given by the upcoming hardware architectures. Specifically, a cacheoblivious technique usually requires sophisticated algorithm design to achieve a comparable performance to its cacheconscious counterpart. Nevertheless, this developmenttime effort is compensated by the automaticity of performance achievement and the reduced ownership cost. Furthermore, this automaticity enables cacheoblivious techniques to outperform their cacheconscious counterparts in multithreading processors.
A CACHEAWARE ALGORITHM FOR PDES ON HIERARCHICAL DATA STRUCTURES BASED ON SPACEFILLING CURVES
, 2006
"... Competitive numerical algorithms for solving partial differential equations have to work with the most efficient numerical methods like multigrid and adaptive grid refinement and thus with hierarchical data structures. Unfortunately, in most implementations, hierarchical data— typically stored in ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Competitive numerical algorithms for solving partial differential equations have to work with the most efficient numerical methods like multigrid and adaptive grid refinement and thus with hierarchical data structures. Unfortunately, in most implementations, hierarchical data— typically stored in trees—cause a nonnegligible overhead in data access. To overcome this quandary— numerical efficiency versus efficient implementation—our algorithm uses spacefilling curves to build up data structures which are processed linearly. In fact, the only kind of data structure used in our implementation is stacks. Thus, data access becomes very fast—even faster than the common access to nonhierarchical data stored in matrices—and, in particular, cache misses are reduced considerably. Furthermore, the implementation of multigrid cycles and/or higher order discretizations as well as the parallelization of the whole algorithm become very easy and straightforward on these data structures.
On the limits of cacheoblivious matrix transposition
 In Proc. of 2nd Symp. of Trustworthy Global Computing
, 2006
"... Abstract Intuitively, a cacheoblivious algorithm implements an adaptive strategy which runs efficiently on any memory hierarchy without requiring previous knowledge of the parameters of the hierarchy. For this reason, cacheobliviousness is an attractive feature of an algorithm meant for a global c ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract Intuitively, a cacheoblivious algorithm implements an adaptive strategy which runs efficiently on any memory hierarchy without requiring previous knowledge of the parameters of the hierarchy. For this reason, cacheobliviousness is an attractive feature of an algorithm meant for a global computing environment, where software may be run on a variety of different platforms for load management purposes. In this paper we present a negative result on cacheobliviousness, namely, we show that an optimal cacheoblivious algorithm for the fundamental primitive of matrix transposition cannot exist without the tall cache assumption, which forces the (unknown) parameters of the memory hierarchy to satisfy a certain technical relation. Our contribution specializes the result of Brodal and Fagerberg for general permutations to matrix transposition, and provides further evidence that the tall cache assumption is often necessary to attain optimality in the context of cacheoblivious algorithms. 1
Brief announcement: Low depth cacheoblivious sorting
 In ACM SPAA. ACM
, 2009
"... Cacheoblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multilevel cache hierarchy, regardless of the specifics (cache size and cache line size) of each level. In this paper, we describe cacheoblivious sorting algorithms with optimal work, ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Cacheoblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multilevel cache hierarchy, regardless of the specifics (cache size and cache line size) of each level. In this paper, we describe cacheoblivious sorting algorithms with optimal work, optimal cache complexity and polylogarithmic depth. Using known mappings, these lead to low cache complexities on sharedmemory multiprocessors with a single level of private caches or a single shared cache. Moreover, the low cache complexities extend to sharedmemory multiprocessors with common configurations of multilevel caches. The key factor in the low cache complexity on multiprocessors is the low depth of the algorithms we propose.