Results 1 - 10
of
36
Holistic Twig Joins on Indexed XML Documents
- In Proc. of VLDB
, 2003
"... Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for e#cient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves an ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for e#cient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestordescendant relationships. In this paper, we address the problem of e#cient processing of holistic twig joins on all/partly indexed XML documents. In particular, we propose an algorithm that utilizes available indices on element sets. While it can be shown analytically that the proposed algorithm is as e#cient as the existing state-of-the-art algorithms in terms of worst case I/O and CPU cost, experimental results on various datasets indicate that the proposed index-based algorithm performs significantly better than the existing ones, especially when binary structural joins in the twig pattern have varying join selectivities.
On the Integration of Structure Indexes and Inverted Lists
- In SIGMOD
, 2004
"... Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions ove ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. These structure indexes are proven to be very effective when applied to queries that examine the “coarse ” structure of documents. For example, for many
Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach
- In CIKM
, 2004
"... With the growin importan ce of semi-structure data in in - formation exchan e, much research has been don to provide an e#ective mechan ism to match a twig queryin an XML database. An umber of algorithms have been proposed recen tly to process a twig query holistically. Those algorithms are quitee# ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
With the growin importan ce of semi-structure data in in - formation exchan e, much research has been don to provide an e#ective mechan ism to match a twig queryin an XML database. An umber of algorithms have been proposed recen tly to process a twig query holistically. Those algorithms are quitee#cien t for quires withon ly an cestor-descen dan t edges. But for queries with mixedan cestor-descen dan tan d paren t-child edges, the previous approaches still may produce large in termediate results, even when the inS( an d output size are more man ageable. To overcome this limitation , in this paper, we propose an ovelholistic twig join algorithm,n amely TwigStackList. Our main techn que is to ook-ahead read some e emen ts in inS] data steams an d cache imitedn umber of them to listsin main memory.
Stack-based algorithms for pattern matching on dags
- In Proc. of VLDB
, 2005
"... Existing work for query processing over graph data models often relies on pre-computing the transitive closure or path indexes. In this paper, we propose a family of stack-based algorithms to handle path, twig, and dag pattern queries for directed acyclic graphs (DAGs) in particular. Our algorithms ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Existing work for query processing over graph data models often relies on pre-computing the transitive closure or path indexes. In this paper, we propose a family of stack-based algorithms to handle path, twig, and dag pattern queries for directed acyclic graphs (DAGs) in particular. Our algorithms do not precompute the transitive closure nor path indexes for a given graph, however they achieve an optimal runtime complexity quadratic in the average size of the query variable bindings. We prove the soundness and completeness of our algorithms and present the experimental results.
PBiTree coding and efficient processing of containment joins
, 2003
"... This paper addresses issues related to containment join processing in tree-structured data such as XML documents. A containment join takes two sets of XML node elements as input and returns pairs of elements such that the containment relationship holds between them. While there are previous algorith ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
This paper addresses issues related to containment join processing in tree-structured data such as XML documents. A containment join takes two sets of XML node elements as input and returns pairs of elements such that the containment relationship holds between them. While there are previous algorithms for processing containment joins, they require both element sets either sorted or indexed. This paper proposes a novel and complete containment query processing framework based on a new coding scheme, PBiTree code. The PBiTree code allows us to determine the ancestor-descendant relationship between two elements from their PBiTree-based codes efficiently. We present algorithms in the framework that are optimized for various combinations of settings. In particular, the newly proposed partitioning based algorithms can process containment joins efficiently without sorting or indexes. Experimental results indicate that the containment join processing algorithms based on the proposed coding scheme outperform existing algorithms significantly. 1.
XML-to-SQL Query Translation Literature: The State of the Art and Open Problems
- In XSym
, 2003
"... Recently, the database research literature has seen an explosion of publications with the goal of using an RDBMS to store and/or query XML data. The problems addressed and solved in this area are diverse. This diversity renders it di#cult to know how the various results presented fit together, a ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Recently, the database research literature has seen an explosion of publications with the goal of using an RDBMS to store and/or query XML data. The problems addressed and solved in this area are diverse. This diversity renders it di#cult to know how the various results presented fit together, and even makes it hard to know what open problems remain. As a first step to rectifying this situation, we present a classification of the problem space and discuss how almost 40 papers fit into this classification. As a result of this study, we find that some basic questions are still open. In particular, for the XML publishing of relational data and for "schema-based" shredding of XML documents into relations, there is no published algorithm for translating even simple path expression queries (with the // axis) into SQL when the XML schema is recursive.
Efficient Processing of XML Twig Queries with OR-Predicates
, 2004
"... An XML twig query, represented as a labeled tree, is essentially a complex selection predicate on both structure and content of an XML document. Twig query matching has been identified as a core operation in querying treestructured XML data. A number of algorithms have been proposed recently to proc ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
An XML twig query, represented as a labeled tree, is essentially a complex selection predicate on both structure and content of an XML document. Twig query matching has been identified as a core operation in querying treestructured XML data. A number of algorithms have been proposed recently to process a twig query holistically. Those algorithms, however, only deal with twig queries without OR-predicates. A straightforward approach that first decomposes a twig query with OR-predicates into multiple twig queries without OR-predicates and then combines their results is obviously not optimal in most cases. In this paper, we study novel holistic-processing algorithms for twig queries with OR-predicates without decomposition. In particular, we present a merge-based algorithm for sorted XML data and an index-based algorithm for indexed XML data. We show that holistic processing is much more efficient than the decomposition approach. Furthermore, we show that using indexes can significantly improve the performance for matching twig queries with OR-predicates, especially when the queries have large inputs but relatively small outputs.
FIX: Feature-based Indexing Technique for XML Documents
- Proceedings of the 32nd VLDB Conference
, 2006
"... In this paper, we study the problem of indexing an XML database. Existing XML indexing techniques focus on clustering methods based on the combinatorial structural properties of an XML document. These techniques cluster tree nodes into an index tree or graph based on their similarities in ancestor-d ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper, we study the problem of indexing an XML database. Existing XML indexing techniques focus on clustering methods based on the combinatorial structural properties of an XML document. These techniques cluster tree nodes into an index tree or graph based on their similarities in ancestor-descendant or sibling relationships. Index look-up then amounts to pattern matching on the clustered tree or graph. In this paper, we propose a feature-based indexing technique, called FIX, based on the spectral graph theory. The basic idea is that for each twig pattern in a collection of XML documents, we calculate a vector of features based on its structural properties. These features are used as a key for the patterns and stored in a B-tree or a multidimensional index tree. Given an XPath query, its feature vector is first calculated and looked up in the index. Then a further refinement phase is performed to fetch the final results. We experimentally study the indexing technique over two scenarios: a large collection of relatively smaller documents, and a single large document. Our experiments show that FIX provides great pruning power and could gain an order of magnitude performance improvement for many XPath queries over existing evaluation techniques. 1
Indexing Dataspaces
, 2007
"... Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interactio ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interaction with dataspaces involves exploring the data, and users do not have a single schema to which they can pose queries. Consequently, it is important that queries are allowed to specify varying degrees of structure, spanning keyword queries to more structure-aware queries. This paper considers indexing support for queries that combine keywords and structure. We describe several extensions to inverted lists to capture structure when it is present. In particular, our extensions incorporate attribute labels, relationships between data items, hierarchies of schema elements, and synonyms among schema elements. We describe experiments showing that our indexing techniques improve query efficiency by an order of magnitude compared with alternative approaches, and scale well with the size of the data.
Ctree: A Compact Tree for Indexing XML Data
- Proceedings of the 6th annual ACM international workshop on web information and data management
, 2004
"... In this paper, we propose a novel compact tree (Ctree) for XML indexing, which provides not only concise path summaries at the group level but also detailed child-parent links at the element level. Group level mapping allows efficient pruning of a large search space while element level mapping provi ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper, we propose a novel compact tree (Ctree) for XML indexing, which provides not only concise path summaries at the group level but also detailed child-parent links at the element level. Group level mapping allows efficient pruning of a large search space while element level mapping provides fast access to the parent of an element. Due to the tree nature of XML data and queries, such fast child-to-parent access is essential for efficient XML query processing. Using group-based element reference, Ctree enables the clustering of inverted lists according to groups, which provides efficient join between inverted lists and structural index group extents. Our experiments reveal that Ctree is efficient for processing both single-path and branching queries with various value predicates.

