Results 1 - 10
of
10
QPipe: A Simultaneously Pipelined Relational Query Engine
- In Proc. SIGMOD
, 2005
"... Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing mate ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Relational DBMS typically execute concurrent queries independently by invoking a set of operator instances for each query. To exploit common data retrievals and computation in concurrent queries, researchers have proposed a wealth of techniques, ranging from buffering disk pages to constructing materialized views and optimizing multiple queries. The ideas proposed, however, are inherently limited by the query-centric philosophy of modern engine designs. Ideally, the query engine should proactively coordinate same-operator execution among concurrent queries, thereby exploiting common accesses to memory and disks as well as common intermediate result computation.
The Nimble XML Data Integration System
- In ICDE
, 2001
"... For better or for worse, XML has emerged as a de facto standard for data interchange. This consensus is likely to lead to increased demand for technology that allows users to integrate data from a variety of applications, repositories, and partners which are located across the corporate intranet or ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
For better or for worse, XML has emerged as a de facto standard for data interchange. This consensus is likely to lead to increased demand for technology that allows users to integrate data from a variety of applications, repositories, and partners which are located across the corporate intranet or on the Internet. Nimble Technology has spent two years developing a product to service this market. Originally conceived after decades of person-years of research on data integration, the product is now being deployed at several Fortune-500 beta-customer sites. This abstract reports on the key challenges we faced in the design of our product and highlights some issues we think require more attention from the research community. In particular, we address architectural issues arising from designing a product to support XML as its core representation, choices in the design of the underlying algebra, on-the-fly data cleaning and caching and materialization policies. 1.
Views and Queries: Determinacy and Rewriting
- In PODS
, 2005
"... We investigate the question of whether a query Q can be answered using a set V of views. We first define the problem in information-theoretic terms: we say that V determines Q if V provides enough information to uniquely determine the answer to Q. Next, we look at the problem of rewriting Q in terms ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We investigate the question of whether a query Q can be answered using a set V of views. We first define the problem in information-theoretic terms: we say that V determines Q if V provides enough information to uniquely determine the answer to Q. Next, we look at the problem of rewriting Q in terms of V using a specific language. Given a view language V and query language Q, we say that a rewriting language R is complete for Vto-Q rewritings if every Q ∈Qcan be rewritten in terms of V ∈ Vusing a query in R, whenever V determines Q. While query rewriting using views has been extensively investigated for some specific languages, the connection to the information-theoretic notion of determinacy, and the question of completeness of a rewriting language, have received little attention. In this paper we investigate systematically the notion of determinacy and its connection to rewriting. The results concern decidability of determinacy for various view and query languages, as well as the power required of complete rewriting languages. We consider languages ranging from first-order to conjunctive queries. 1.
Materializing Views with Minimal Size To Answer Queries
, 2003
"... In this paper we study the following problem. Given a database and a set of queries, we want to find, in advance, a set of views that can compute the answers to the queries, such that the size of the viewset (i.e., the amount of space, in bytes, required to store the viewset) is minimal on the given ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
In this paper we study the following problem. Given a database and a set of queries, we want to find, in advance, a set of views that can compute the answers to the queries, such that the size of the viewset (i.e., the amount of space, in bytes, required to store the viewset) is minimal on the given database. This problem is important for many applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results di#er significantly depending on whether the workload queries have self-joins. If queries can have selfjoins, then a disjunctive viewset can be a better solution than any set of conjunctive views. We show that the problem of finding a minimal-size disjunctive viewset is decidable, and give an upper bound on its complexity. If workload queries cannot have self-joins, there is no need to consider disjunctive viewsets, and we show that the problem is in NP. We describe a very compact search space of conjunctive views, which contains all views in at least one optimal disjunctive viewset. We give a dynamic-programming algorithm for finding minimal-size disjunctive viewsets for queries without self-joins, and discuss heuristics to make the algorithm e#cient.
To Share or Not to Share?
"... Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show that, contrary to common intuition, this is not always the case in practice, especially in the highly parallel world of chip multiprocessors. As the number of cores in the system increases, a trade-off appears between exploiting work sharing opportunities and the available parallelism. To resolve the trade-off, we develop an analytical approach that predicts the effect of work sharing in multi-core systems. Database systems can use the model to determine, statically or at runtime, whether work sharing is beneficial and apply it only when appropriate. The contributions of this paper are as follows. First, we introduce and analyze the effects of the trade-off between work sharing and parallelism on database systems running complex decision-support queries. Second, we propose an intuitive and simple model that can evaluate the trade-off using real-world measurement approximations of the query execution processes. Furthermore, we integrate the model into a prototype database execution engine, and demonstrate that selective work sharing according to the model outperforms never-share static schemes by 20 % on average and always-share ones by 2.5x. 1.
Automated Design of Multidimensional Clustering Tables for Relational Databases
, 2004
"... The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2 Universal Database^TM (DB2 UDB) pro ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2 Universal Database^TM (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on whatif query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM DB2 UDB for Linux, UNIX and Windows Version 8.2 release.
Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments
- Information and Computer Science
, 2003
"... Many database applications adopt a client-server architecture, in which data resides on a server that receives queries from a client. For each client's query, the server often needs to transfer to the client a large amount of data that is an answer to the query. The communication network in these en ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Many database applications adopt a client-server architecture, in which data resides on a server that receives queries from a client. For each client's query, the server often needs to transfer to the client a large amount of data that is an answer to the query. The communication network in these environments could become a bottleneck in the computation. In this paper we study how to minimize the communication costs of transferring answers to largejoin queries from server to client. We propose a novel technique that decomposes the answer into intermediate results, or views, which can reduce the redundancy in the answer. These views are transferred to the client and are used by the client to compute the final answer. There are several challenges in implementing this technique: (1) the number of possible plans to decompose the answers could be very large; (2) the technique requires an efficient algorithm to give an accurate estimate of the size of each view; and (3) many factors could affect the decomposition choice; one such factor is whether relevant data is cached on the client. Our extensive experiments on queries adapted from the TPC-H benchmark show that our technique can significantly reduce the communication costs of transferring answers to large-join queries. The extra steps used in our approach do pay off to reduce the total time of transferring the result of a query, when the result has a lot of redundancy.
Interaction-Aware Scheduling of Report Generation Workloads
"... Abstract The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions; making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interactionaware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made on-line. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently out-
View Selection for Stream Processing
, 2002
"... Consider XML content-based document routing: a stream of XML documents are routed through a network, and routing decisions are taken based on the result of evaluating XPath predicates on these documents. Parsing XML documents and interpreting XPath expressions is the main bottleneck in such systems. ..."
Abstract
- Add to MetaCart
Consider XML content-based document routing: a stream of XML documents are routed through a network, and routing decisions are taken based on the result of evaluating XPath predicates on these documents. Parsing XML documents and interpreting XPath expressions is the main bottleneck in such systems. We propose a novel solution to speedup the evaluation of XPath predicates based on precomputing views for the XML documents. There are both similarities and di#erences from the "view selection problem" in relational databases. We describe an architecture for using these views, discuss several design choices and make a brief theoretical analysis for one special case. Finally, we report some initial experiments, showing the potential for query speedup by using stream views.
Compression Aware Physical Database Design
"... Modern RDBMSs support the ability to compress data using methods such as null suppression and dictionary encoding. Data compression offers the promise of significantly reducing storage requirements and improving I/O performance for decision support queries. However, compression can also slow down up ..."
Abstract
- Add to MetaCart
Modern RDBMSs support the ability to compress data using methods such as null suppression and dictionary encoding. Data compression offers the promise of significantly reducing storage requirements and improving I/O performance for decision support queries. However, compression can also slow down update and query performance due to the CPU costs of compression and decompression. In this paper, we study how data compression affects choice of appropriate physical database design, such as indexes, for a given workload. We observe that approaches that decouple the decision of whether or not to choose an index from whether or not to compress the index can result in poor solutions. Thus, we focus on the novel problem of integrating compression into physical database design in a scalable manner. We have implemented our techniques by modifying Microsoft SQL Server and the Database Engine Tuning Advisor (DTA) physical design tool. Our techniques are general and are potentially applicable to DBMSs that support other compression methods. Our experimental results on real world as well as TPC-H benchmark workloads demonstrate the effectiveness of our techniques. 1.

