Results 1 -
5 of
5
QPipe: A Simultaneously Pipelined Relational Query Engine
- In Proc. SIGMOD
, 2005
"... Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing mate ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing materialized views and optimizing multiple queries. The ideas proposed, however, are inherently limited by the query-centric philosophy of modern engine designs. Ideally, the query engine should proactively coordinate same-operator execution among concurrent queries, thereby exploiting common accesses to memory and disks as well as common intermediate result computation.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS
- In Proc. of the 33 rd Intl. Conf. on Very Large Databases (VLDB
, 2007
"... This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and shows that existing scheduling policies do not fully exploit data-sharing opportunities and therefore result in poor disk bandwidth utilization. We propose the ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and shows that existing scheduling policies do not fully exploit data-sharing opportunities and therefore result in poor disk bandwidth utilization. We propose the Cooperative Scans framework that enhances performance in such scenarios by improving data-sharing between concurrent scans. It performs dynamic scheduling of queries and their data requests, taking into account the current system situation. We first present results on top of an NSM/PAX storage layout, showing that it achieves significant performance improvements over traditional policies in terms of both the number of I/Os and overall execution time, as well as latency of individual queries. We provide benchmarks with varying system parameters, data sizes and query loads to confirm the improvement occurs in a wide range of scenarios. Then we extend our proposal to a more complicated DSM scenario, discussing numerous problems related to the two-dimensional nature of disk scheduling in column stores. 1.
To Share or Not to Share?
"... Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of chip multiprocessors. As the number of cores in the system increases, a trade-off appears between exploiting work sharing opportunities and the available parallelism. To resolve the trade-off, we develop an analytical approach that predicts the effect of work sharing in multi-core systems. Database systems can use the model to determine, statically or at runtime, whether work sharing is beneficial and apply it only when appropriate. The contributions of this paper are as follows. First, we introduce and analyze the effects of the trade-off between work sharing and parallelism on database systems running complex decision-support queries. Second, we propose an intuitive and simple model that can evaluate the trade-off using real-world measurement approximations of the query execution processes. Furthermore, we integrate the model into a prototype database execution engine, and demonstrate that selective work sharing according to the model outperforms never-share static schemes by 20 % on average and always-share ones by 2.5x. 1.
Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries
"... This paper focuses on the problem of improving distributed query throughput of the RDBMS-based data integration system that has to inherit the query execution model of the underlying RDBMS: execute each query independently and utilize a global buffer pool mechanism to provide disk page sharing acros ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper focuses on the problem of improving distributed query throughput of the RDBMS-based data integration system that has to inherit the query execution model of the underlying RDBMS: execute each query independently and utilize a global buffer pool mechanism to provide disk page sharing across concurrent query execution processes. However, this model is not suitable for processing concurrent distributed queries because the foundation, the memory-disk hierarchy, does not exist for data provided by remote sources. Therefore, the query engine cannot exploit any data sharing so that each process will have to interact with data sources independently: issue data requests and fetch data over the network. This paper presents Request Window, a novel DQP mechanism that can detect and employ data sharing opportunities across concurrent distributed queries. By combining multiple similar data requests issued to the same data source to a common data request, Request Window allows concurrent query executing processes to share the common result data. With the benefits of reduced source burdens and data transfers, the throughput of query engine can be significantly improved. This paper also introduces the IGNITE system, an extended PostgreSQL with DQP support. Our experimental results show that Request Window makes IGNITE achieve a 1.7x speedup over a commercial data integration system when running a workload of distributed TPC-H queries. 1.
Deadlock Resolution in Pipelined Query Graphs
, 2005
"... Pipelining is a widely used technique that query execution engines employ to improve individual query execution times. In recently proposed settings, pipelining is used as an alternative to materialization to evaluate query plan graphs, where nodes in a query plan can have multiple parents. Such sce ..."
Abstract
- Add to MetaCart
Pipelining is a widely used technique that query execution engines employ to improve individual query execution times. In recently proposed settings, pipelining is used as an alternative to materialization to evaluate query plan graphs, where nodes in a query plan can have multiple parents. Such scenarios include shared table scans, runtime operator sharing, parallel sorting, and pipelined Multi-Query Optimization (MQO) plans. While pipelining in query graphs can increase performance, it can also lead to runtime deadlocks. The existing deadlock solutions focus on total deadlock prevention by statically deciding between pipelining and materialization, and apply only to MQO. The lack of runtime information leads to highly conservative decisions. Formally, this conservatism makes it NP-hard to find an optimal pipelining strategy that materializes a minimum cost set of nodes in a query graph.

