Results 1 - 10
of
13
Efficient Sorting Using Registers and Caches
- in Proceedings of the 4th Workshop on Algorithm Engineering (WAE 2000
, 2000
"... Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features... ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features...
The cost of cache-oblivious searching
- IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cache-oblivious searching. The paper shows that no cache-oblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
This paper gives tight bounds on the cost of cache-oblivious searching. The paper shows that no cache-oblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the disk-access machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2-level) DAM model and cache-oblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal k-level DAM search structure and the optimal cache-oblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cache-oblivious structure almost replicates the performance of an optimal parameterized k-level DAM structure.
Caches As Filters: A Framework for the Analysis of Caching Systems
, 2001
"... This dissertation describes the Cache Filter Model, an analytical framework for cache system analysis. This framework provides a language and formal notation that enables researchers to reason and communicate about systems in an insightful new way. There are four major components that form the frame ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This dissertation describes the Cache Filter Model, an analytical framework for cache system analysis. This framework provides a language and formal notation that enables researchers to reason and communicate about systems in an insightful new way. There are four major components that form the framework. First, the TSpec notation is a formal way for researchers to communicate with clarity about memory references generated by a processor. Second, the concept of an equivalence class of memory references provides an abstraction for eliminating artifacts due to chance address bindings or specific inputs. Third, the functional cache filter model uses the TSpec notation and equivalence class concept to allow designers to more clearly understand the effects of cache systems on particular memory references. Fourth, new metrics provide more insight into cache system behavior than current measures such as hit rate or average memory access time. This dissertation presents the cache filter framework in detail and demonstrates its use on several example kernels.
Efficient sorting using registers and caches
- WAE, WORKSHOP ON ALGORITHM ENGINEERING , LECTURE NOTES IN COMPUTER SCIENCE
, 2000
"... Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequat ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-merge, which achieves better performance in practice over algorithms that are superior in the theoretical models. R-merge is designed to minimize memory stall cycles rather than cache misses by considering features common to many system designs.
Cache-oblivious algorithms and data structures
- In SWAT
, 2004
"... Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the two-level I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal off-line cache replacement strategy. The result are algorithms that automatically apply to multi-level memory hierarchies. This paper gives an overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al. 1
Precise Automatable Analytical Modeling of the Cache Behavior of Codes with Indirections
- Concurrency Computat.: Pract. Exper. 2006; 00:1–15 Prepared using cpeauth.cls CACHE ANALYSIS FOR IRREGULAR CODES 17
, 2005
"... The performance of memory hierarchies, in which caches play an essential role, is critical in nowadays general-purpose and embedded computing systems because of the growing memory bottleneck problem. Unfortunately, cache behavior is very unstable and difficult to predict. This is particularly true i ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The performance of memory hierarchies, in which caches play an essential role, is critical in nowadays general-purpose and embedded computing systems because of the growing memory bottleneck problem. Unfortunately, cache behavior is very unstable and difficult to predict. This is particularly true in the presence of irregular access patterns, which exhibit little locality. Such patterns are very common, for example, in applications in which pointers or compressed sparse matrices give place to indirections. Nevertheless, cache behavior in the presence of irregular access patterns has not been widely studied. In this paper we present an extension of a systematic analytical modeling technique based on PMEs (probabilistic miss equations), previously developed by the authors, that allows the automated analysis of the cache behavior for codes with irregular access patterns resulting from indirections. The model generates very accurate predictions despite the irregularities and has very low computing requirements, being the first model that gathers these desirable characteristics that can automatically analyze this kind of codes. These properties enable this model to help drive compiler optimizations, as we show with an example.
Scanning Multiple Sequences Via Cache Memory
- Algorithmica
, 2003
"... We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitou ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory.
Automated and accurate cache behavior analysis for codes with irregular access patterns
- 12th Workshop on Compilers for Parallel Computers, CPC 2006
, 2006
"... Abstract. The memory hierarchy plays an essential role in the performance of current computers, thus good analysis tools that help predict and understand its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of applicatio ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. The memory hierarchy plays an essential role in the performance of current computers, thus good analysis tools that help predict and understand its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of application are overcome. For example, while there has been extensive research on the modeling of codes with regular access patterns, less attention has been paid to the codes with irregular patterns due to the increased difficulty to analyze them. Nevertheless, many important applications exhibit this kind of patterns, and their lack of locality make them more cache-demanding, which makes their study more relevant. In this paper we define the information requirements of an existing analytical model that can provide fast and accurate predictions of the cache behavior of codes with irregular access patterns. In addition, we describe the integration of the model in a research compiler oriented to automatic kernel recognition in scientific codes. The paper shows how to exploit the powerful information-gathering capabilities provided by the compiler to allow automated modeling of loop-oriented scientific codes. 1
An Optimal Cache-Oblivious Priority Queue and its Application to Graph Algorithms
- SIAM JOURNAL ON COMPUTING
, 2007
"... We develop an optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and delete-min operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We develop an optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and delete-min operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cache-oblivious data structure, $M$ and $B$ are not used in the description of the structure. Our structure is as efficient as several previously developed external memory (cache-aware) priority queue data structures, which all rely crucially on knowledge about $M$ and $B$. Priority queues are a critical component in many of the best known external memory graph algorithms, and using our cache-oblivious priority queue we develop several cache-oblivious graph algorithms.
Cache Behavior Modelling for Codes Involving Banded Matrices
- Proc. 19th Intl. Workshop on Languages and Compilers for Parallel Computing
"... Abstract. Understanding and improving the memory hierarchy behavior is one of the most important challenges in current architectures. Analytical models are a good approach for this, but they have been traditionally limited by either their restricted scope of application or their lack of accuracy. Mo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Understanding and improving the memory hierarchy behavior is one of the most important challenges in current architectures. Analytical models are a good approach for this, but they have been traditionally limited by either their restricted scope of application or their lack of accuracy. Most models can only predict the cache behavior of codes that generate regular access patterns. The Probabilistic Miss Equation(PME) model is able nevertheless to model accurately the cache behavior for codes with irregular access patterns due to data-dependent conditionals or indirections. Its main limitation is that it only considers irregular access patterns that exhibit an uniform distribution of the accesses. In this work, we extend the PME model to enable to analyze more realistic and complex irregular accesses. Namely, we consider indirections due to the compressed storage of most real banded matrices. 1

