Results 1 -
6 of
6
ROX: Run-time Optimization of XQueries
, 2009
"... Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-ofthe-art of even relational query optimization still struggles to cope with cost model estimation errors that ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-ofthe-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated joins and selections. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute, materialize partial results, and use sampling based estimation techniques to observe the characteristics of intermediates. The proposed technique takes as input a Join Graph where the edges are either equi-joins or XPath steps, and the execution environment provides value- and structural-join algorithms, as well as structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. Our approach deals with these issues by limiting the run-time search space to so-called “zeroinvestment” algorithms for which sampling can be guaranteed to be strictly linear in sample size. All operators and XML value indices used by ROX for sampling have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.
S.: XML tree structure compression
- In: XANTEC
"... In an XML document a considerable fraction consists of markup, that is, begin and end-element tags describing the document’s tree structure. XML compression tools such as XMill separate the tree structure from the data content and compress each separately. The main focus in these compression tools i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In an XML document a considerable fraction consists of markup, that is, begin and end-element tags describing the document’s tree structure. XML compression tools such as XMill separate the tree structure from the data content and compress each separately. The main focus in these compression tools is how to group similar data content together prior to performing standard data compression such as gzip, bzip2, or ppm. In contrast, the focus of this paper is on compressing the tree structure part of an XML document. We use a known algorithm to derive a grammar representation of the tree structure which factors out the repetition of tree patterns. We then investigate several succinct binary encodings of these grammars. Our experiments show that we can be consistently smaller than the tree structure compression carried out by XMill, using the same backend compressors as XMill on our encodings. However, the most surprising result is that our own Huffman-like encoding of the grammars (without any backend compressor whatsoever) consistently outperforms XMill with gzip backend. This is of particular interest because our Huffmannlike encoding can be queried without prior decompression. To the best of our knowledge this offers the smallest queriable XML tree structure representation currently available. 1
ABSTRACT Dependable Cardinality Forecasts for XQuery
"... Though inevitable for effective cost-based query rewriting, the derivation of meaningful cardinality estimates has remained a notoriously hard problem in the context of XQuery. By basing the estimation on a relational representation of the XQuery syntax, we show how existing cardinality estimation t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Though inevitable for effective cost-based query rewriting, the derivation of meaningful cardinality estimates has remained a notoriously hard problem in the context of XQuery. By basing the estimation on a relational representation of the XQuery syntax, we show how existing cardinality estimation techniques for XPath and proven relational estimation machinery can play together to yield dependable forecasts for arbitrary XQuery (sub)expressions. Our approach domain identifiers guide our query analyzer through the estimation process and allow for informed decisions even in case of deeply nested XQuery expressions. A variant of projection paths [15] provides a versatile interface into which existing techniques for XPath cardinality estimation can be plugged in seamlessly. We demonstrate an implementation of this interface based on data guides. Experiments show how our approach can equally cope with both, structureand value-based queries. It is robust with respect to intermediate estimation errors, from which we typically found our implementation to recover gracefully. 1.
Efficient Memory Representation of XML Document Trees
"... Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML d ..."
Abstract
- Add to MetaCart
Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, context-free tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees. Key words: Tree grammar, compression, in-memory XML representation 1
Selectivity Estimation of Twig Queries on Cyclic Graphs
"... Abstract—Recent applications including the Semantic Web, Web ontology and XML have sparked a renewed interest on graph-structured databases. Among others, twig queries have been a popular tool for retrieving subgraphs from graphstructured databases. To optimize twig queries, selectivity estimation h ..."
Abstract
- Add to MetaCart
Abstract—Recent applications including the Semantic Web, Web ontology and XML have sparked a renewed interest on graph-structured databases. Among others, twig queries have been a popular tool for retrieving subgraphs from graphstructured databases. To optimize twig queries, selectivity estimation has been a crucial and classical step. However, the majority of existing works on selectivity estimation focuses on relational and tree data. In this paper, we investigate selectivity estimation of twig queries on possibly cyclic graph data. To facilitate selectivity estimation on cyclic graphs, we propose a matrix representation of graphs derived from prime labeling — a scheme for reachability queries on directed acyclic graphs. With this representation, we exploit the consecutive ones property (C1P) of matrices. As a consequence, a node is mapped to a point in a two-dimensional space whereas a query is mapped to multiple points. We adopt histograms for scalable selectivity estimation. We perform an extensive experimental evaluation on the proposed technique and show that our technique controls the estimationerrorunder1.3%on XMARK and DBLP,whichismore accurate than previous techniques. On TREEBANK, we produce RMSE and NRMSE 6.8 times smaller than previous techniques. I.
Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data
"... In the last decade, we have witnessed a huge success of the peerto-peer (P2P) computing model. This has lead to the development of many Internet-scale applications and systems that are used commercially. Recently, the problem of computing statistics over data in Internet-scale systems has received a ..."
Abstract
- Add to MetaCart
In the last decade, we have witnessed a huge success of the peerto-peer (P2P) computing model. This has lead to the development of many Internet-scale applications and systems that are used commercially. Recently, the problem of computing statistics over data in Internet-scale systems has received attention. In this paper, we discuss the problem of cardinality estimation of XPath queries over distributed XML data stored in an Internet-scale environment such as a P2P network. Such cardinality estimates are useful for XQuery optimization and statistical hypothesis testing in domains such as health informatics. We present a novel gossip algorithm called XGossip, which given an XPath query, estimates the number of XML documents that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures – properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing bandwidth consumption. We conduct theoretical analyses on the quality of cardinality estimates, message complexity, and bandwidth consumption. We present a preliminary performance evaluation on PlanetLab and discuss our ongoing work.

