Results 1 - 10
of
15
Early Nested Word Automata for XPath Query Answering on XML Streams
, 2013
"... Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algori ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algorithms for nested word automata in a highly efficient manner. This approximation is tight in practice for automata obtained from XPath expressions, and even exact for many of them. We have implemented an XPath streaming algorithm based on early nested word automata in the Fxp tool. Fxp outperforms most previous tools in efficiency, while covering more queries of the XPathMark benchmark.
XML Prefiltering as a String Matching Problem
"... Abstract — We propose a new technique for the efficient search and navigation in XML documents and streams. This technique takes string matching algorithms designed for efficient keyword search in flat strings into the second dimension, to navigate in tree structured data. We consider the important ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract — We propose a new technique for the efficient search and navigation in XML documents and streams. This technique takes string matching algorithms designed for efficient keyword search in flat strings into the second dimension, to navigate in tree structured data. We consider the important XML data management task of prefiltering XML documents (also called XML projection) as an application for our approach. Different from existing prefiltering schemes, we usually process only fractions of the input and get by with very economical consumption of both main memory and processing time. Our experiments reveal that, already on low-complexity problems such as XPath filtering, inmemory query engines can experience speed-ups by two orders of magnitude. I.
Minimization of Tree Pattern Queries with Constraints
"... Tree pattern queries (TPQs) provide a natural and easy formalism to query tree-structured XML data, and the efficient processing of such queries has attracted a lot of attention. Since the size of a TPQ is a key determinant of its evaluation cost, recent research has focused on the problem of query ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Tree pattern queries (TPQs) provide a natural and easy formalism to query tree-structured XML data, and the efficient processing of such queries has attracted a lot of attention. Since the size of a TPQ is a key determinant of its evaluation cost, recent research has focused on the problem of query minimization using integrity constraints to eliminate redundant query nodes; specifically, TPQ minimization has been studied for the class of forward and subtype constraints (FT-constraints). In this paper, we explore the TPQ minimization problem further for a richer class of FBST-constraints that includes not only FT-constraints but also backward and sibling constraints. By exploiting the properties of minimal queries under FBST-constraints, we propose efficient algorithms to both compute a single minimal query as well as enumerate all minimal queries. In addition, we also develop more efficient minimization algorithms for the previously studied class of FT-constraints. Our experimental study demonstrates the effectiveness and efficiency of query minimization using FBST-constraints.
Projection for XML update optimization
- In EDBT’11
, 2011
"... ABSTRACT While projection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. This paper investigates a projection based optimization mechanism for XQuery Update Facility expressions in the presence of a schema. This paper includes a for ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
ABSTRACT While projection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. This paper investigates a projection based optimization mechanism for XQuery Update Facility expressions in the presence of a schema. This paper includes a formal development and study of the method as well as experiments testifying its effectiveness.
Complexity of Earliest Query Answering with Streaming Tree Automata
, 2008
"... We investigate the complexity of earliest query answering for n-ary node selection queries defined by streaming tree automata (STAs). We elaborate an algorithm that selects query answers upon reception of the shortest relevant prefix of the input tree on the stream. In general, deciding if a prefix ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We investigate the complexity of earliest query answering for n-ary node selection queries defined by streaming tree automata (STAs). We elaborate an algorithm that selects query answers upon reception of the shortest relevant prefix of the input tree on the stream. In general, deciding if a prefix is sufficient for the selection of a n-tuple is DEXPTIME-complete (even for n = 0). For queries defined by deterministic STAs, this decision problem is in polynomial time combined complexity, as implemented in our earliest query answering algorithm.
XQuery Streaming by Forest Transducers
- IN: ICDE
, 2013
"... Streaming of XML transformations is a challenging task and only a few existing systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortc ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Streaming of XML transformations is a challenging task and only a few existing systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principled approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs — indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then use an existing streaming engine for MFTs and apply a well-founded set of optimizations from functional programming such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX’s efficiency is due to clever and explicit buffer management.
A functional language for hyperstreaming XSLT. Unpublished manuscript available at http://researchers.lille.inria.fr/ niehren/Papers/X-Fun/0.pdf
, 2013
"... Abstract The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML. Therefore, the current approach of the XSLT working of the W3C is to provide streaming support only for a smaller fragment of X ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML. Therefore, the current approach of the XSLT working of the W3C is to provide streaming support only for a smaller fragment of XSLT 3.0. This has the drawback that many existing XSLT programs need to be rewritten in order to become executable on XML streams, while many others cannot be rewritten at all, since defining nonstreamble transformations. In this paper, we propose a new hyperstreaming approach that does not require any a priori restrictions. The model of hyperstreaming generalizes on the model of streaming by adding shredding operations for the output stream, so that its parts may be plugged together later on. Many transformations such as flips of document pairs are hyperstreamable but not streamable. We then present the functional language X-Fun for defining transformations between XML data trees, while providing shredding instructions. X-Fun can be understood as an extension of Frisch's XSTREAM language with output shredding, while pattern matching is replaced by tree navigation with XPATH expressions. We provide a compiler from XSLT into a fragment of X-Fun, which can be considered as the core of XSLT. We then present a hyperstreaming algorithm for evaluating X-Fun programs which combines a recent XPath evaluator with a traditional functional programming engine. We have implemented a hyperstreaming evaluator for X-Fun and thus for XSLT and compare it experimentally with SAXON's XSLT implementation. It turns out that many XSLT programs become hyperstreamable with good efficiency and without any manual rewriting. Note from February 2014: The first contribution of this report is the definition of X-Fun. This definition is outdated meanwhile, since X-Fun evolved at lot. See our follow-up paper at http://hal.inria.fr/hal-00954692. The second contribution on hyperstreaming is not described there though.
Semantic Query Optimization for Processing XML Streams with Minimized Memory Footprint
, 2007
"... XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evalu-ation over XML streams requires the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evalu-ation over XML streams requires the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conform-ing to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to release buffered data earlier or possibly avoid ever storing it, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimiza-
Optimisation de Mises à jour XML par typage et projection
"... Abstract. La projection est une des techniques utilisées pour permettre de réduire les besoins en terme de taille mémoire nécessaire aux moteurs de requêtes XML "in-memory". L’idée sous-jacente à cette technique est simple: étant donnée une requête Q à évaluer sur un document XML D, au lie ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. La projection est une des techniques utilisées pour permettre de réduire les besoins en terme de taille mémoire nécessaire aux moteurs de requêtes XML "in-memory". L’idée sous-jacente à cette technique est simple: étant donnée une requête Q à évaluer sur un document XML D, au lieu de procéder au calcul des réponses de Q sur D, la requête Q est évaluée sur un document D′, plus petit que D, obtenu lors du chargement de dernier en mémoire, par élagage des parties de D qui ne sont pas utiles pour Q. Le document Q ′ qui celui sur lequel la requête Q est calculée, est une projection du document initial Q. Il est souvent de taille drastiquement inférieure à celle de Q. Ceci est dû au fait que les requêtes sont en général très sélectives. Alors que cette technique de projection a été étudiée et développée assez largement pour l’interrogation de document XML, à notre connaissance, ce type de technique n’a pas été explorée ni appliquée aux mises à jour de documents XML. L’objet de cet article est donc de proposer une technique d’optimisation de mises à jour de documents au format XML, exploitant le typage des documents. XML projection is one of the main adopted optimization techniques for reducing memory consumption in XQuery in-memory engines. The main idea behind this technique is quite simple: given a query Q over an XML document D, instead of evaluating Q on D, the query Q is evaluated on a smaller document D ′ obtained from D by pruning out, at loading-time, parts of D that are unrelevant for Q. The actual queried document D ′ is a projection of the original one, and is often much smaller than D due to the fact that queries tend to be quite selective in general. While pro-jection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. The purpose of the paper is to investigate a projection based optimization mechanism for updates. 1
Efficiently loading and processing XML streams
- In IDEAS ’08: Proceedings of the 2008 international symposium on Database engineering and applications
, 2008
"... ABSTRACT XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate tokenbased stream processing using an algebraic framework where both tokens and tuples are modeled in a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate tokenbased stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.