Results 1 - 10
of
22
Processing Queries on Tree-Structured Data Efficiently
- In PODS’06
, 2006
"... This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes – conjunctive queries, first-order queries, datalog, and XPath. I also point out a number o ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes – conjunctive queries, first-order queries, datalog, and XPath. I also point out a number of connections among previous results that have not been observed before. The techniques belong to five groups: 1. employing orders on the nodes of the tree for efficient labeling schemes and structural joins, 2. linear-time algorithms for evaluating Horn-SAT (the datalog technique), 3. structural decomposition techniques for queries, 4. query rewriting, and 5. holistic query processing techniques that can be explained using ideas from constraint satisfaction. 1
Reasoning about XML update constraints
- In BDA
, 2006
"... We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint implication, we also consider an instancebased approach. More precisely, we study implication with respect to a current tree instance, resulting from a series of unknown updates. The main motivation of our work is reasoning about data integrity under update restrictions in contexts where owners may lose control over their data, such as in publishing or exchange.
XPath evaluation in linear time
- IN: PROC. OF PODS 2008
, 2008
"... We consider a fragment of XPath where attribute values can only be tested for equality. We show that for any fixed unary query in this fragment, the set of nodes that satisfy the query can be calculated in time linear in the document size. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We consider a fragment of XPath where attribute values can only be tested for equality. We show that for any fixed unary query in this fragment, the set of nodes that satisfy the query can be calculated in time linear in the document size.
XPath Evaluation in Linear Time with Polynomial Combined Complexity
, 2009
"... We consider a fragment of XPath 1.0, where attribute and text values may be compared. We show that for any unary query in this fragment, the set of nodes that satisfy the query can be calculated in time linear in the document size and polynomial in the size of the query. The previous algorithm for t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We consider a fragment of XPath 1.0, where attribute and text values may be compared. We show that for any unary query in this fragment, the set of nodes that satisfy the query can be calculated in time linear in the document size and polynomial in the size of the query. The previous algorithm for this fragment also had linear data complexity but exponential complexity in the query size.
Incremental XPath Evaluation
"... We study the problem of incrementally maintaining the result of an XPath query on an XML database under updates. In its most general form, this problem asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of u ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We study the problem of incrementally maintaining the result of an XPath query on an XML database under updates. In its most general form, this problem asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of updates U to D and the problem is to compute the result of Q(U(D)), i.e., the result of evaluating query Q on the database D after having applied the updates U. In order to quickly answer this question, we are allowed to maintain an auxiliary data structure, and the complexity of the maintenance algorithms is measured in (i) the size of the auxiliary data structure, (ii) the worst-case time per update needed to compute Q(U(D)) and (iii) the worst-case time per update needed to bring the auxiliary data structure up to date. We allow three kinds of updates: node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D) · poly(|Q|)) per update and conjunctive forward XPath queries in time O(depth(D)·log(width(D))·poly(|Q|)) per update, where |Q | is the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in the database D, respectively. The auxiliary data structures for maintenance are linear in |D | and polynomial in |Q | in all these cases.
Four Lessons in Versatility or How Query Languages Adapt to the Web
"... Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the W ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”.
Probabilistic XML via Markov Chains
, 2009
"... We show how Recursive Markov Chains (RMCs) and their restrictions can define probabilistic distributions over XML documents, and study tractability ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We show how Recursive Markov Chains (RMCs) and their restrictions can define probabilistic distributions over XML documents, and study tractability
Foundations of RDF Databases
"... Abstract The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by co ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities. 1
Impact of XML Schema Evolution
, 2011
"... We consider the problem of XML Schema evolution. In the ever-changing context of the web, XML schemas continuously change in order to cope with the natural evolution of entities they describe. Schema changes have important consequences. First, existing documents valid with respect to the original sc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider the problem of XML Schema evolution. In the ever-changing context of the web, XML schemas continuously change in order to cope with the natural evolution of entities they describe. Schema changes have important consequences. First, existing documents valid with respect to the original schema are no longer guaranteed to fulfill the constraints described by the evolved schema. Second, the evolution also impacts programs manipulating documents whose structure is described by the original schema.
We propose a unifying framework for determining the effects of XML Schema evolution both on the validity of documents and on queries. The system is very powerful in analyzing various scenarios in which forward/backward compatibility of schemas is broken, and in which the result of a query may not be anymore what was expected. Specifically, the system offers a predicate language which allows one to formulate properties related to schema evolution. The system then relies on exact reasoning techniques to perform a fine-grained analysis. This yields either a formal proof of the property or a counter-example that can be used for debugging purposes. The system has been fully implemented and tested with real-world use cases, in particular with the main standard document formats used on the web, as defined by W3C. The system identifies precisely compatibility relations between document formats. In case these relations do not hold, the system can identify queries that must be reformulated in order to produce the expected results across successive schema versions.
A Study of Positive XPath with Parent/Child Navigation
"... We study the expressiveness of Positive XPath with parent/child navigation, denoted XPath +, from two angles. First, we establish that XPath + is equivalent in expressive power to some of its sub-fragments as well as to the class of tree queries, a sub-class of the first-order conjunctive queries de ..."
Abstract
- Add to MetaCart
We study the expressiveness of Positive XPath with parent/child navigation, denoted XPath +, from two angles. First, we establish that XPath + is equivalent in expressive power to some of its sub-fragments as well as to the class of tree queries, a sub-class of the first-order conjunctive queries defined over label, parent, and child predicates. The translation algorithm from tree queries to XPath + yields a simple normal form for XPath + queries. Using this normal form, we can effectively partition an XPath + query into subqueries that can be expressed in a very small sub-fragment of XPath + for which efficient evaluation strategies are available. Second, we characterize the expressiveness of XPath + in terms of its ability to distinguish nodes in a document. We show that two such nodes cannot be distinguished if and only if the paths from the root of the documents to these nodes have equal length and corresponding nodes on these paths are bisimilar.

