• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Buffering Database Operations for Enhanced Instruction Cache Performance (2004)

by J Zhou, K A Ross
Venue:In SIGMOD Conference
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 17
Next 10 →

Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters

by M. Anton Ertl, David Gregg - PLDI'03 , 2003
"... Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; however, their prediction accuracy for existin ..."
Abstract - Cited by 38 (7 self) - Add to MetaCart
Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%–50%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter buildtime) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 3.17 over efficient threaded-code interpreters, and speedups by a factor of up to 1.3 over techniques relying on superinstructions alone.

QPipe: A Simultaneously Pipelined Relational Query Engine

by Stavros Harizopoulos, Vladislav Shkapenyuk, Anastassia Ailamaki - In Proc. SIGMOD , 2005
"... Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing mate ..."
Abstract - Cited by 35 (10 self) - Add to MetaCart
Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing materialized views and optimizing multiple queries. The ideas proposed, however, are inherently limited by the query-centric philosophy of modern engine designs. Ideally, the query engine should proactively coordinate same-operator execution among concurrent queries, thereby exploiting common accesses to memory and disks as well as common intermediate result computation.

Performance tradeoffs in read-optimized databases

by Stavros Harizopoulos, Velen Liang, Daniel J. Abadi, Samuel Madden - In VLDB 2006: Proceedings of the 32nd international conference on Very large data bases , 2006
"... Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certai ..."
Abstract - Cited by 31 (11 self) - Add to MetaCart
Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform rowbased systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database. In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system. 1.

Improving Database Performance on Simultaneous Multithreading Processors

by Jingren Zhou, John Cieslewicz, Kenneth A. Ross, Mihir Shah , 2005
"... Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor.

STEPS towards cacheresident transaction processing

by Stavros Harizopoulos, Anastassia Ailamaki - In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB , 2004
"... Online transaction processing (OLTP) is a multibillion dollar industry with high-end database servers employing state-of-the-art processors to maximize performance. Unfortunately, recent studies show that CPUs are far from realizing their maximum intended throughput because of delays in the processo ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
Online transaction processing (OLTP) is a multibillion dollar industry with high-end database servers employing state-of-the-art processors to maximize performance. Unfortunately, recent studies show that CPUs are far from realizing their maximum intended throughput because of delays in the processor caches. When running OLTP, instruction-related delays in the memory subsystem account for 25 to 40 % of the total execution time. In contrast to data, instruction misses cannot be overlapped with out-of-order execution, and instruction caches cannot grow as the slower access time directly affects the processor speed. The challenge is to alleviate the instructionrelated delays without increasing the cache size. We propose Steps, a technique that minimizes instruction cache misses in OLTP workloads by multiplexing concurrent transactions and exploiting common code paths. One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution. Steps yields up to 96.7 % reduction in instruction cache misses for each additional concurrent transaction, and at the same time eliminates up to 64 % of mispredicted branches by loading a repeating execution pattern into the CPU. This paper (a) describes the design and implementation of Steps, (b) analyzes Steps using microbenchmarks, and (c) shows Steps performance when running TPC-C on top of the Shore storage manager. 1

Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct

by Stefan Manegold, Martin L. Kersten, Peter Boncz
"... The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of pro ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions. 1.

Shore-MT: A Scalable Storage Manager for the Multicore Era

by Ryan Johnson, Ippokratis Pandis, Nikos Hardavellas, Anastasia Ailamaki, Babak Falsafi - EXTENDING DATABASE TECHNOLOGY (EDBT) , 2009
"... Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed stora ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed storage managers to use non-scalable approaches without any penalty. With the arrival of multicore chips, however, this situation is rapidly changing. More and more threads can run in parallel, stressing the internal scalability of the storage manager. Systems optimized for high performance at a limited number of cores are not assured similarly high performance at a higher core count, because unanticipated scalability obstacles arise. We benchmark four popular open-source storage managers (Shore, BerkeleyDB, MySQL, and PostgreSQL) on a modern multicore machine, and find that they all suffer in terms of scalability. We briefly examine the bottlenecks in the various storage engines. We then present Shore-MT, a multithreaded and highly scalable version of Shore which we developed by identifying and successively removing internal bottlenecks. When compared to other DBMS, Shore-MT exhibits superior scalability and 2-4 times higher absolute throughput than its peers. We also show that designers should favor scalability to single-thread performance, and highlight important principles for writing scalable storage engines, illustrated with real examples from the development of Shore-MT.

Vectorized Data Processing on the Cell Broadband Engine

by Sándor Héman, Niels Nes, Marcin Zukowski, Peter Boncz - In Proc. of the Third International Workshop on Data Management on New Hardware , 2007
"... Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (i ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (ii) all performance-critical branches need to be eliminated, (iii) a very small and hard limit on program code size should be respected. While we argue that conventional database implementations, i.e. row-stores with Volcano-style tuple pipelining, are a hard fit to Cell, it turns out that the three challenges are quite easily met in databases that use column-wise processing. We managed to implement a proof-of-concept port of the vectorized query processing model of MonetDB/X100 on Cell by running the operator pipeline on the PowerPC, but having it execute the vectorized primitives (data parallel) on its SPE cores. A performance evaluation on TPC-H Q1 shows that vectorized query processing on Cell can beat conventional PowerPC and Itanium2 CPUs by a factor 20. 1.

Bridging the Gap between OLAP and SQL

by Jens-Peter Dittrich, Donald Kossmann, Alexander Kreutz, Er Kreutz - In VLDB , 2005
"... In the last ten years, database vendors have invested heavily in order to extend their products with new features for decision support. ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
In the last ten years, database vendors have invested heavily in order to extend their products with new features for decision support.

MCCDB: minimizing cache conflicts in multi-core processors for databases

by Rubao Lee, Xiaodong Zhang - in Proc. VLDB’09
"... In a typical commercial multi-core processor, the last level cache (LLC) is shared by two or more cores. Existing studies have shown that the shared LLC is beneficial to concurrent query processes with commonly shared data sets. However, the shared LLC can also be a performance bottleneck to concurr ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
In a typical commercial multi-core processor, the last level cache (LLC) is shared by two or more cores. Existing studies have shown that the shared LLC is beneficial to concurrent query processes with commonly shared data sets. However, the shared LLC can also be a performance bottleneck to concurrent queries, each of which has private data structures, such as a hash table for the widely used hash join operator, causing serious cache conflicts. We show that cache conflicts on multi-core processors can significantly degrade overall database performance. In this paper, we propose a hybrid system method called MCC-DB for accelerating executions of warehouse-style queries, which relies on the DBMS knowledge of data access patterns to minimize LLC conflicts in multicore systems through an enhanced OS facility of cache partitioning. MCC-DB consists of three components: (1) a cacheaware query optimizer carefully selects query plans in order to balance the numbers of cache-sensitive and cache-insensitive plans; (2) a query execution scheduler makes decisions to corun queries with an objective of minimizing LLC conflicts; and (3) an enhanced OS kernel facility partitions the shared LLC according to each query’s cache capacity need and locality strength. We have implemented MCC-DB by patching the three components in PostgreSQL and Linux kernel. Our intensive measurements on an Intel multi-core system with warehouse-style queries show that MCC-DB can reduce query execution times by up to 33%. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University