Results 1 - 10
of
24
Column-Stores vs. Row-Stores: How Different Are They Really
- In SIGMOD
, 2008
"... There has been a significant amount of excitement and recent work on column-oriented database systems (“column-stores”). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database systems (“row-stores”) on analytical workloads such ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
There has been a significant amount of excitement and recent work on column-oriented database systems (“column-stores”). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database systems (“row-stores”) on analytical workloads such as those found in data warehouses, decision support, and business intelligence applications. The elevator pitch behind this performance difference is straightforward: column-stores are more I/O efficient for read-only queries since they only have to read from disk (or from memory) those attributes accessed by a query. This simplistic view leads to the assumption that one can obtain the performance benefits of a column-store using a row-store: either by vertically partitioning the schema, or by indexing every column so that columns can be accessed independently. In this paper, we demonstrate that this assumption is false. We compare the performance of a commercial row-store under a variety of different configurations with a column-store and show that the row-store performance is significantly slower on a recently proposed data warehouse benchmark. We then analyze the performance difference and show that there are some important differences between the two systems at the query executor level (in addition to the obvious differences at the storage layer level). Using the column-store, we then tease apart these differences, demonstrating the impact on performance of a variety of column-oriented query execution techniques, including vectorized query processing, compression, and a new join algorithm we introduce in this paper. We conclude that while it is not impossible for a row-store to achieve some of the performance advantages of a column-store, changes must be made to both the storage layer and the query executor to fully obtain the benefits of a column-oriented approach.
Performance tradeoffs in read-optimized databases
- In VLDB 2006: Proceedings of the 32nd international conference on Very large data bases
, 2006
"... Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certai ..."
Abstract
-
Cited by 31 (11 self)
- Add to MetaCart
Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform rowbased systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database. In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system. 1.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS
- In Proc. of the 33 rd Intl. Conf. on Very Large Databases (VLDB
, 2007
"... This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and shows that existing scheduling policies do not fully exploit data-sharing opportunities and therefore result in poor disk bandwidth utilization. We propose the ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and shows that existing scheduling policies do not fully exploit data-sharing opportunities and therefore result in poor disk bandwidth utilization. We propose the Cooperative Scans framework that enhances performance in such scenarios by improving data-sharing between concurrent scans. It performs dynamic scheduling of queries and their data requests, taking into account the current system situation. We first present results on top of an NSM/PAX storage layout, showing that it achieves significant performance improvements over traditional policies in terms of both the number of I/Os and overall execution time, as well as latency of individual queries. We provide benchmarks with varying system parameters, data sizes and query loads to confirm the improvement occurs in a wide range of scenarios. Then we extend our proposal to a more complicated DSM scenario, discussing numerous problems related to the two-dimensional nature of disk scheduling in column stores. 1.
Main-Memory Scan Sharing For Multi-Core CPUs
"... Computer architectures are increasingly based on multi-core CPUs and large memories. Memory bandwidth, which has not kept pace with the increasing number of cores, has become the primary processing bottleneck, replacing disk I/O as the limiting factor. To address this challenge, we provide novel alg ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Computer architectures are increasingly based on multi-core CPUs and large memories. Memory bandwidth, which has not kept pace with the increasing number of cores, has become the primary processing bottleneck, replacing disk I/O as the limiting factor. To address this challenge, we provide novel algorithms for increasing the throughput of Business Intelligence (BI) queries, as well as for ensuring fairness and avoiding starvation among a concurrent set of such queries. To maximize throughput, we propose a novel FullSharing scheme that allows all concurrent queries, when performing base-table I/O, to share the cache belonging to a given core. We then generalize this approach to a BatchSharing scheme that avoids thrashing on ”agg-tables”—hash tables that are used for aggregation processing—caused by execution of too many queries on a core. This scheme partitions queries into batches such that the working-set of agg-table entries for each batch can fit into a cache; an efficient sampling technique is used to estimate selectivities and working-set sizes for purposes of query partitioning. Finally, we use lottery-scheduling techniques to ensure fairness and impose a hard upper bound on staging time to avoid starvation. On our 8-core testbed, we were able to completely remove the memory I/O bottleneck, increasing throughput by a factor of 2 to 2.5, while also maintaining fairness and avoiding starvation. 1.
Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct
"... The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of pro ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions. 1.
Data Processing on FPGAs
"... Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. In this paper, we explore how to program data pr ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. In this paper, we explore how to program data processing operators on top of field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. In the paper, we study how data processing can be accelerated using an FPGA. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (an asynchronous sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation. 1.
MRShare: Sharing Across Multiple Queries in MapReduce
"... Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context of cloud computing, designed to serve long running queries (jobs) which can be processed in batch mode. Taking into account that different jobs often perform similar work, there are many opportunities for sharing. In principle, sharing similar work reduces the overall amount of work, which can lead to reducing monetary charges incurred while utilizing the processing infrastructure. In this paper we propose a sharing framework tailored to MapReduce. Our framework, MRShare, transforms a batch of queries into a new batch that will be executed more efficiently, by merging jobs into groups and evaluating each group as a single query. Based on our cost model for MapReduce, we define an optimization problem and we provide a solution that derives the optimal grouping of queries. Experiments in our prototype, built on top of Hadoop, demonstrate the overall effectiveness of our approach and substantial savings. 1.
Predictable Performance for Unpredictable Workloads
"... This paper introduces Crescando: a scalable, distributed relational table implementation designed to perform large numbers of queries and updates with guaranteed access latency and data freshness. To this end, Crescando leverages a number of modern query processing techniques and hardware trends. Sp ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This paper introduces Crescando: a scalable, distributed relational table implementation designed to perform large numbers of queries and updates with guaranteed access latency and data freshness. To this end, Crescando leverages a number of modern query processing techniques and hardware trends. Specifically, Crescando is based on parallel, collaborative scans in main memory and so-called “querydata” joins known from data-stream processing. While the proposed approach is not always optimal for a given workload, it provides latency and freshness guarantees for all workloads. Thus, Crescando is particularly attractive if the workload is unknown, changing, or involves many different queries. This paper describes the design, algorithms, and implementation of a Crescando storage node, and assesses its performance on modern multi-core hardware. 1.
A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses
"... Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation res ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time. We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an “alwayson” pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries. 1.
To Share or Not to Share?
"... Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of chip multiprocessors. As the number of cores in the system increases, a trade-off appears between exploiting work sharing opportunities and the available parallelism. To resolve the trade-off, we develop an analytical approach that predicts the effect of work sharing in multi-core systems. Database systems can use the model to determine, statically or at runtime, whether work sharing is beneficial and apply it only when appropriate. The contributions of this paper are as follows. First, we introduce and analyze the effects of the trade-off between work sharing and parallelism on database systems running complex decision-support queries. Second, we propose an intuitive and simple model that can evaluate the trade-off using real-world measurement approximations of the query execution processes. Furthermore, we integrate the model into a prototype database execution engine, and demonstrate that selective work sharing according to the model outperforms never-share static schemes by 20 % on average and always-share ones by 2.5x. 1.

