• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Efficient hash probes on modern processors (2006)

by K A Ross
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

ABSTRACT Adaptive Aggregation on Chip Multiprocessors

by John Cieslewicz
"... The recent introduction of commodity chip multiprocessors requires that the design of core database operations be carefully examined to take full advantage of on-chip parallelism. In this paper we examine aggregation in a multi-core environment, the Sun UltraSPARC T1, a chip multiprocessor with eigh ..."
Abstract - Cited by 14 (3 self) - Add to MetaCart
The recent introduction of commodity chip multiprocessors requires that the design of core database operations be carefully examined to take full advantage of on-chip parallelism. In this paper we examine aggregation in a multi-core environment, the Sun UltraSPARC T1, a chip multiprocessor with eight cores and a shared L2 cache. Aggregation is an important aspect of query processing that is seemingly easy to understand and implement. Our research, however, demonstrates that a chip multiprocessor adds new dimensions to understanding hash-based aggregation performance— concurrent sharing of aggregation data structures and contentious accesses to frequently used values. We also identify a trade off between private data structures assigned to each thread versus shared data structures for aggregation. Depending on input characteristics, different aggregation strategies are optimal and choosing the wrong strategy can result in a performance penalty of over an order of magnitude. We provide a thorough explanation of the factors affecting aggregation performance on chip multiprocessors and identify three key input characteristics that dictate performance: (1) average run length of identical group-by values, (2) locality of references to the aggregation hash table, and (3) frequency of repeated accesses to the same hash table location. We then introduce an adaptive aggregation operator that performs lightweight sampling of the input to choose the correct aggregation strategy with high accuracy. Our experiments verify that our adaptive algorithm chooses the highest performing aggregation strategy on a number of common input distributions. 1.

Vectorized Data Processing on the Cell Broadband Engine

by Sándor Héman, Niels Nes, Marcin Zukowski, Peter Boncz - In Proc. of the Third International Workshop on Data Management on New Hardware , 2007
"... Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (i ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (ii) all performance-critical branches need to be eliminated, (iii) a very small and hard limit on program code size should be respected. While we argue that conventional database implementations, i.e. row-stores with Volcano-style tuple pipelining, are a hard fit to Cell, it turns out that the three challenges are quite easily met in databases that use column-wise processing. We managed to implement a proof-of-concept port of the vectorized query processing model of MonetDB/X100 on Cell by running the operator pipeline on the PowerPC, but having it execute the vectorized primitives (data parallel) on its SPE cores. A performance evaluation on TPC-H Q1 shows that vectorized query processing on Cell can beat conventional PowerPC and Itanium2 CPUs by a factor 20. 1.

History-Independent Cuckoo Hashing

by Moni Naor, Gil Segev, Udi Wieder
"... Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distribute ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distributed settings, and offers significant improvements compared to other schemes. In this work we construct a practical history-independent dynamic dictionary based on cuckoo hashing. In a history-independent data structure, the memory representation at any point in time yields no information on the specific sequence of insertions and deletions that led to its current content, other than the content itself. Such a property is significant when preventing unintended leakage of information, and was also found useful in several algorithmic settings. Our construction enjoys most of the attractive properties of cuckoo hashing. In particular, no dynamic memory allocation is required, updates are performed in expected amortized constant time, and membership queries are performed in worst case constant time. Moreover, with high probability, the lookup procedure queries only two memory entries which are independent and can be queried in parallel. The approach underlying our construction is to enforce a canonical memory representation on cuckoo hashing. That is, up to the initial randomness, each set of elements has a unique memory representation.

De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results

by Yuriy Arbitman, Moni Naor, Gil Segev
"... Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whe ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whereas such an amortized guarantee may be suitable for some applications, in other applications (such as high-performance routing) this is highly undesirable. Kirsch and Mitzenmacher (Allerton ’07) proposed a de-amortization of cuckoo hashing using queueing techniques that preserve its attractive properties. They demonstrated a significant improvement to the worst case performance of cuckoo hashing via experimental results, but left open the problem of constructing a scheme with provable properties. In this work we present a de-amortization of cuckoo hashing that provably guarantees constant worst case operations. Specifically, for any sequence of polynomially many operations, with overwhelming probability over the randomness of the initialization phase, each operation is performed in constant time. In addition, we present a general approach for proving that the performance guarantees are preserved when using hash functions with limited independence

Design and evaluation of main memory hash join algorithms for multi-core CPUs

by Spyros Blanas, Yinan Li, Jignesh M. Patel - In SIGMOD Conference , 2011
"... The focus of this paper is on investigating efficient hash join algorithms for modern multi-core processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of ha ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
The focus of this paper is on investigating efficient hash join algorithms for modern multi-core processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of hash join algorithms. Then, we implement these main memory algorithms on two radically different modern multiprocessor systems, and carefully examine the factors that impact the performance of each method. Our analysis reveals some interesting results – a very simple hash join algorithm is very competitive to the other more complex methods. This simple join algorithm builds a shared hash table and does not partition the input relations. Its simplicity implies that it requires fewer parameter settings, thereby making it far easier for query optimizers and execution engines to use it in practice. Furthermore, the performance of this simple algorithm improves dramatically as the skew in the input data increases, and it quickly starts to outperform all other algorithms. Based on our results, we propose that database implementers consider adding this simple join algorithm to their repertoire of main memory join algorithms, or adapt their methods to mimic the strategy employed by this algorithm, especially when joining inputs with skewed data distributions.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University