Results 1 - 10
of
17
Asking the right questions: Model-driven optimization using probes
- In Proc. of the 2006 ACM Symp. on Principles of Database Systems
, 2006
"... In several database applications, parameters like selectivities and load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of query optimizers and monitoring schemes can be improved by spending resources like time or bandw ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
In several database applications, parameters like selectivities and load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of query optimizers and monitoring schemes can be improved by spending resources like time or bandwidth in observing or resolving these parameters, so that better query plans can be generated. In a resourceconstrained situation, deciding which parameters to observe in order to best optimize the expected quality of the plan generated (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem. We present a framework for studying such problems, and present several scenarios arising in anomaly detection in complex systems, monitoring extreme values in sensor networks, load shedding in data stream systems, and estimating rates in wireless channels and minimum latency routes in networks, which can be modeled in this framework with the appropriate objective functions. Even for several simple objective functions, we show the problems are Np-Hard. We present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments.
Collecting and Maintaining Just-in-time statistics
- In Proceedings of ICDE 2007
, 2007
"... Traditional DBMSs decouple statistics collection and query optimization both in space and time. Decoupling in time may lead to outdated statistics. Decoupling in space may cause statistics not to be available at the desired granularity needed to optimize a particular query, or some important statist ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Traditional DBMSs decouple statistics collection and query optimization both in space and time. Decoupling in time may lead to outdated statistics. Decoupling in space may cause statistics not to be available at the desired granularity needed to optimize a particular query, or some important statistics may not be available at all. Overall, this decoupling often leads to large cardinality estimation errors and, in consequence, to the selection of suboptimal plans for query execution. In this paper, we present JITS, a system for proactively collecting query-specific statistics during query compilation. The system employs a lightweight sensitivity analysis to choose which statistics to collect by making use of previously collected statistics and database activity patterns. The collected statistics are materialized and incrementally updated for later reuse. We present the basic concepts, architecture, and key features of JITS. We demonstrate its benefits through an extensive experimental study on a prototype inside the IBM DB2 engine. 1.
Query evaluation on a database given by a random graph
- In ICDT
, 2007
"... Abstract. We consider random graphs, and their extensions to random structures, with edge probabilities of the form βn −α, where n is the number of vertices, α, β are fixed and α> 1 (α> arity − 1 for structures of higher arity). We consider conjunctive properties over these random graphs, and invest ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. We consider random graphs, and their extensions to random structures, with edge probabilities of the form βn −α, where n is the number of vertices, α, β are fixed and α> 1 (α> arity − 1 for structures of higher arity). We consider conjunctive properties over these random graphs, and investigate the problem of computing their asymptotic conditional probabilities. This provides us a novel approach to dealing with uncertainty in databases, with applications to data privacy and other database problems. 1
ROX: Run-time Optimization of XQueries
, 2009
"... Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-ofthe-art of even relational query optimization still struggles to cope with cost model estimation errors that ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-ofthe-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated joins and selections. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute, materialize partial results, and use sampling based estimation techniques to observe the characteristics of intermediates. The proposed technique takes as input a Join Graph where the edges are either equi-joins or XPath steps, and the execution environment provides value- and structural-join algorithms, as well as structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. Our approach deals with these issues by limiting the run-time search space to so-called “zeroinvestment” algorithms for which sampling can be guaranteed to be strictly linear in sample size. All operators and XML value indices used by ROX for sampling have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.
A Pay-As-You-Go Framework for Query Execution Feedback
"... Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state-of-the-art approach for obtaining execution feedback is “passive” monitoring which records the cardinality of each opera ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state-of-the-art approach for obtaining execution feedback is “passive” monitoring which records the cardinality of each operator in the execution plan. We observe that there are many cases where even after repeated executions of the same query with use of feedback from passive monitoring, suboptimal choices in the execution plan cannot be corrected. We present a novel “pay-as-you-go” framework in which a query potentially incurs a small overhead on each execution but obtains cardinality information that is not available with passive monitoring alone. Such a framework can significantly extend the reach of query execution feedback in obtaining better plans. We have implemented our techniques in Microsoft SQL Server, and our evaluation on real world and synthetic queries suggests that plan quality can improve significantly compared to passive monitoring even at low overheads. 1.
Adaptive and Robust Query Processing with SHARP
, 2006
"... Database catalogs often do not contain enough statistical information to correctly cost all possible physical plans. In their absence, the optimizer can produce incorrect estimates and select sub-optimal plans for execution. To address this problem for a sub-class of queries, we propose SHARP, a new ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Database catalogs often do not contain enough statistical information to correctly cost all possible physical plans. In their absence, the optimizer can produce incorrect estimates and select sub-optimal plans for execution. To address this problem for a sub-class of queries, we propose SHARP, a new multi-join, adaptive, relational operator that joins three or more relations of a star-join. SHARP reduces the possible impact of optimizer mistakes by determining which plan to execute independently of optimization estimates. During normal query processing, SHARP collects statistics, and by using a combination of latebinding plan decisions and tuple routing strategies, it is able to change join order and table access methods. However, unlike previous tuple routing operators used for in-memory stream processing, SHARP was designed to process local relations with sizes much larger than available memory. We have implemented SHARP in the open-source DBMS Predator, and we present an extensive experimental evaluation showing the significant performance benefits of SHARP.
How to Probe for an Extreme Value
- PROCEEDINGS OF THE 25 TH ACM SIGACT-SIGMOD-SIGART
, 2006
"... In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth i ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth in observing or resolving the values of these parameters. In a resource-constrained situation, deciding which parameters to observe in order to best optimize the expected system performance (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem. In this paper, we initiate the study of such problems that we term “model-driven optimization”. In particular, we study the problem of optimizing the minimum value in the presence of observable distributions. We show that this problem is NP-Hard, and present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments and connections to covering integer programs.
K Nearest Neighbor Queries and KNN-Joins in Large Relational Databases (Almost) for Free
"... Abstract — Finding the ..."
Xplus: A SQL-Tuning-Aware Query Optimizer
"... The need to improve a suboptimal execution plan picked by the query optimizer for a repeatedly run SQL query arises routinely. Complex expressions, skewed or correlated data, and changing conditions can cause the optimizer to make mistakes. For example, the optimizer may pick a poor join order, over ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The need to improve a suboptimal execution plan picked by the query optimizer for a repeatedly run SQL query arises routinely. Complex expressions, skewed or correlated data, and changing conditions can cause the optimizer to make mistakes. For example, the optimizer may pick a poor join order, overlook an important index, use a nested-loop join when a hash join would have done better, or cause an expensive, but avoidable, sort to happen. SQL tuning is also needed while tuning multi-tier services to meet service-level objectives. The difficulty of SQL tuning can be lessened considerably if users and higher-level tuning tools can tell the optimizer: “I am not satisfied with the performance of the plan p being used for the query Q that runs repeatedly. Can you generate a (δ%) better plan? ” This paper designs, implements, and evaluates Xplus which, to our knowledge, is the first query optimizer to provide this feature. Xplus goes beyond the traditional plan-first-execute-next approach: Xplus runs some (sub)plans proactively, collects monitoring data from the runs, and iterates. A nontrivial challenge is in choosing a small set of plans to run. Xplus guides this process efficiently using an extensible architecture comprising SQL-tuning experts with different goals, and a policy to arbitrate among the experts. We show the effectiveness of Xplus on real-life tuning scenarios created using TPC-H queries on a PostgreSQL database. 1.
Robustness in Automatic Physical Database Design
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Automatic physical database design tools rely on “what ..."
Abstract
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Automatic physical database design tools rely on “what-if ” interfaces to the query opti-mizer to estimate the execution time of the training query workload under different candi-date physical designs. The tools use these what-if interfaces to recommend physical designs that minimize the estimated execution time of the input training workload. Minimizing estimated execution time alone can lead to designs that are not robust to query optimizer errors and workload changes. In particular, if the optimizer makes errors in estimating the execution time of the workload queries, then the recommended physical design may actu-ally degrade the performance of these queries. In this sense, the physical design is risky. Furthermore, if the production queries are slightly different from the training queries, the recommended physical design may not benefit them at all. In this sense, the physical design

