Results 1 - 10
of
52
TIMBER: A Native XML Database
- The VLDB Journal
, 2002
"... This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in ..."
Abstract
-
Cited by 113 (10 self)
- Add to MetaCart
This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.
A Comprehensive XQuery to SQL Translation Using Dynamic Interval Encoding
, 2003
"... The W3C XQuery language recommendation, based on a hierarchical and ordered document model, supports a wide variety of constructs and use cases. There is a diversity of approaches and strategies for evaluating XQuery expressions, in many cases only dealing with limited subsets of the language. In th ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
The W3C XQuery language recommendation, based on a hierarchical and ordered document model, supports a wide variety of constructs and use cases. There is a diversity of approaches and strategies for evaluating XQuery expressions, in many cases only dealing with limited subsets of the language. In this paper we describe an implementation approach that handles XQuery with arbitrarily-nested FLWR expressions, element constructors and built-in functions (including structural comparisons). Our proposal maps an XQuery expression to a single equivalent SQL query using a novel dynamic interval encoding of a collection of XML documents as relations, augmented with information tied to the query evaluation environment. The dynamic interval technique enables (suitably enhanced) relational engines to produce predictably good query plans that do not restrict the use of sort-merge join query operators. The benefits are realized despite the challenges presented by intermediate results that create arbitrary documents and the need to preserve document order as prescribed by semantics of XQuery. Finally, our experimental results demonstrate that (native or relational) XML systems can benefit from the above technique to avoid a quadratic scale up penalty that e#ectively prevents the evaluation of nested FLWR expressions for large documents.
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
- In SIGMOD
, 2006
"... Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-bas ..."
Abstract
-
Cited by 84 (22 self)
- Add to MetaCart
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-theart with a number of new technical contributions, such as looplifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11 GB. The performance section also provides an extensive comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met. 1.
ViST: A Dynamic Index Method for Querying XML Data by Tree Structures
- In SIGMOD
, 2003
"... much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that query ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is e#ective, scalable, and e#cient in supporting structural queries.
Accelerating XPath Evaluation in any RDBMS
- ACM Transactions on Database Systems
, 2003
"... This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, et ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among related work on XML indexing structures which had a focus on the child and descendant axes only. The index has been designed with a close eye on the XPath semantics as well as the desire to engineer its internals so that it can be supported well by existing relational database query processing technology: the index (a) permits set-oriented (or, rather, sequence-oriented) path evaluation, and (b) can be implemented and queried using well-established relational index structures, notably B-trees and R-trees
On Labeling Schemes for the Semantic Web
, 2003
"... This paper focuses on the optimization of the navigation through voluminous subsumption hierarchies of topics employed by Portal Catalogs like Netscape Open Directory (ODP). Weadvocate for the use of labeling schemes for modeling these hierarchies in order to efficiently answer queries such as subsu ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
This paper focuses on the optimization of the navigation through voluminous subsumption hierarchies of topics employed by Portal Catalogs like Netscape Open Directory (ODP). Weadvocate for the use of labeling schemes for modeling these hierarchies in order to efficiently answer queries such as subsumption check, descendants, ancestors or nearest common ancestor, which usually require costly transitive closure computations. We first give a qualitative comparison of three main families of schemes, namely bit vector, prefix and interval based schemes. We then show that two labeling schemes are good candidates for an efficient implementation of label querying using standard relational DBMS, namely, the Dewey Prefix scheme[6] and an Interval scheme byAgrawal, Borgida and Jagadish[1] We compare their storage and query evaluation performance for the 16 ODP hierarchies using the PostgreSQL engine.
A Succinct Physical Storage Scheme for Efficient Evaluation
- of Path Queries in XML. In ICDE’04, pages 54 – 65
, 2004
"... Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. In this paper, we propose a novel approach, next-of-kin (NoK ..."
Abstract
-
Cited by 27 (12 self)
- Add to MetaCart
Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. In this paper, we propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the nodeselection step, and to reduce the join size significantly in the second step. To efficiently perform NoK pattern matching, we also propose a succinct XML physical storage scheme that is adaptive to updates and streaming XML as well. Our performance results demonstrate that the proposed storage scheme and path evaluation algorithm is highly efficient and outperforms the other tested systems in most cases. 1.
Labeling schemes for small distances in trees
- In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
, 2003
"... Abstract. We consider labeling schemes for trees, supporting various relationships between nodes at small distance. For instance, we show that given a tree T and an integer k we can assign labels to each node of T such that given the label of two nodes we can decide, from these two labels alone, if ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Abstract. We consider labeling schemes for trees, supporting various relationships between nodes at small distance. For instance, we show that given a tree T and an integer k we can assign labels to each node of T such that given the label of two nodes we can decide, from these two labels alone, if the distance between v and w is at most k and if so compute it. For trees with n nodes and k ≥ 2, we give a lower bound on the maximum label length of log n + Ω(log log n) bits, and for constant k, we give an upper bound of log n+O(log log n). Bounds for ancestor, sibling, connectivity and bi- and triconnectivity labeling schemes are also presented. Key words. Labeling schemes, trees. AMS subject classifications. 68R10, 68W01
Boxes: Efficient maintenance of order-based labeling for dynamic XML data
- In Proc. of ICDE
, 2005
"... Order-based element labeling for tree-structured XML data is an important technique in XML processing. It lies at the core of many fundamental XML operations such as containment join and twig matching. While labeling for static XML documents is well understood, less is known about how to maintain ac ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Order-based element labeling for tree-structured XML data is an important technique in XML processing. It lies at the core of many fundamental XML operations such as containment join and twig matching. While labeling for static XML documents is well understood, less is known about how to maintain accurate labeling for dynamic XML documents, when elements and subtrees are inserted and deleted. Most existing approaches do not work well for arbitrary update patterns; they either produce unacceptably long labels or incur enormous relabeling costs. We present two novel I/O-efficient data structures, W-BOX and B-BOX, that efficiently maintain labeling for large, dynamic XML documents. We show analytically and experimentally that both, despite consuming minimal amounts of storage, gracefully handle arbitrary update patterns without sacrificing lookup efficiency. The two structures together provide a nice tradeoff between update and lookup costs: W-BOX has logarithmic amortized update cost and constant worst-case lookup cost, while B-BOX has constant amortized update cost and logarithmic worst-case lookup cost. We further propose techniques to eliminate the lookup cost for read-heavy workloads. 1.
Incremental maintenance of path expression views
- In SIGMOD
, 2005
"... Caching data by maintaining materialized views typically requires updating the cache appropriately to reflect dynamic source updates. Extensive research has addressed the problem of incremental view maintenance for relational data but only few works have addressed it for semi-structured data. In thi ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Caching data by maintaining materialized views typically requires updating the cache appropriately to reflect dynamic source updates. Extensive research has addressed the problem of incremental view maintenance for relational data but only few works have addressed it for semi-structured data. In this paper we address the problem of incremental maintenance of views defined over XML documents using pathexpressions. The approach described in this paper has the following main features that distinguish it from the previous works: (1) The view specification language is powerful and standardized enough to be used in realistic applications. (2) The size of the auxiliary data maintained with the views depends on the expression size and the answer size regardless of the source data size.(3) No source schema is assumed to exist; the source data can be any general well-formed XML document. Experimental evaluation is conducted to assess the performance benefits of the proposed approach.

