• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Combined static and dynamic analysis for effective buffer minimization in streaming XQuery evaluation. (2007)

by M Schmidt, S Scherzinger, C Koch
Venue:In ICDE,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

Early Nested Word Automata for XPath Query Answering on XML Streams

by Denis Debarbieux , Olivier Gauwin , Joachim Niehren , Tom Sebastian , Mohamed Zergaoui , 2013
"... Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algori ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algorithms for nested word automata in a highly efficient manner. This approximation is tight in practice for automata obtained from XPath expressions, and even exact for many of them. We have implemented an XPath streaming algorithm based on early nested word automata in the Fxp tool. Fxp outperforms most previous tools in efficiency, while covering more queries of the XPathMark benchmark.
(Show Context)

Citation Context

...is a major format for information exchange besides Json, also for Rdf linked open data and relational data. Therefore, complex event processing for Xml streams has been studied for more than a decade =-=[12,7,19,20,5,17,13,9,18]-=-. Query answering for XPath is the most basic algorithmic task on Xml streams, since XPath is a language hosted by the W3C standards Xslt and XQuery. Memory efficiency is essential for processing Xml ...

XML Prefiltering as a String Matching Problem

by Christoph Koch, Stefanie Scherzinger, Michael Schmidt
"... Abstract — We propose a new technique for the efficient search and navigation in XML documents and streams. This technique takes string matching algorithms designed for efficient keyword search in flat strings into the second dimension, to navigate in tree structured data. We consider the important ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Abstract — We propose a new technique for the efficient search and navigation in XML documents and streams. This technique takes string matching algorithms designed for efficient keyword search in flat strings into the second dimension, to navigate in tree structured data. We consider the important XML data management task of prefiltering XML documents (also called XML projection) as an application for our approach. Different from existing prefiltering schemes, we usually process only fractions of the input and get by with very economical consumption of both main memory and processing time. Our experiments reveal that, already on low-complexity problems such as XPath filtering, inmemory query engines can experience speed-ups by two orders of magnitude. I.
(Show Context)

Citation Context

...often process XML data ad-hoc, without loading it into a database or building an in-memory tree representation. Doing this efficiently has been recognized as an important data management problem [1]– =-=[7]-=-. In XML data management, we face similar problems as in string matching, as we often need to detect patterns (such as a specific tagname) within XML input streams. However, the state of the art in st...

Minimization of Tree Pattern Queries with Constraints

by Ding Chen, Chee-yong Chan
"... Tree pattern queries (TPQs) provide a natural and easy formalism to query tree-structured XML data, and the efficient processing of such queries has attracted a lot of attention. Since the size of a TPQ is a key determinant of its evaluation cost, recent research has focused on the problem of query ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Tree pattern queries (TPQs) provide a natural and easy formalism to query tree-structured XML data, and the efficient processing of such queries has attracted a lot of attention. Since the size of a TPQ is a key determinant of its evaluation cost, recent research has focused on the problem of query minimization using integrity constraints to eliminate redundant query nodes; specifically, TPQ minimization has been studied for the class of forward and subtype constraints (FT-constraints). In this paper, we explore the TPQ minimization problem further for a richer class of FBST-constraints that includes not only FT-constraints but also backward and sibling constraints. By exploiting the properties of minimal queries under FBST-constraints, we propose efficient algorithms to both compute a single minimal query as well as enumerate all minimal queries. In addition, we also develop more efficient minimization algorithms for the previously studied class of FT-constraints. Our experimental study demonstrates the effectiveness and efficiency of query minimization using FBST-constraints.
(Show Context)

Citation Context

...h FBST and CTPQ to minimize the input queries on both XMark and DBLP datasets, and then evaluated the respective minimized queries on the XML datasets using the efficient query evaluation engine, GCX =-=[16]-=-. The test queries on XMark are shown in the first column of Table 2. Queries X2 corresponds to XMark’s benchmark queries Q2; while X1 and X3 are slightly modified versions of XMark’s benchmark querie...

Projection for XML update optimization

by Mohamed-Amine Baazizi , Nicole Bidoit , Dario Colazzo , Noor Malla , Marina Sahakyan - In EDBT’11 , 2011
"... ABSTRACT While projection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. This paper investigates a projection based optimization mechanism for XQuery Update Facility expressions in the presence of a schema. This paper includes a for ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
ABSTRACT While projection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. This paper investigates a projection based optimization mechanism for XQuery Update Facility expressions in the presence of a schema. This paper includes a formal development and study of the method as well as experiments testifying its effectiveness.
(Show Context)

Citation Context

...r q. The queried document t ′ , a projection of the original one, is often much smaller than t due to selectivity of queries. In order to determine an optimal projection of t several approaches exist =-=[11, 12, 18, 19]-=-. Most of them are based on query path extraction: all the paths expressing the dataneeds for the query q are first extracted and then used for projecting t. In particular, the type based approach [11...

Complexity of Earliest Query Answering with Streaming Tree Automata

by Olivier Gauwin, Anne-cécile Caron, Joachim Niehren, Sophie Tison , 2008
"... We investigate the complexity of earliest query answering for n-ary node selection queries defined by streaming tree automata (STAs). We elaborate an algorithm that selects query answers upon reception of the shortest relevant prefix of the input tree on the stream. In general, deciding if a prefix ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We investigate the complexity of earliest query answering for n-ary node selection queries defined by streaming tree automata (STAs). We elaborate an algorithm that selects query answers upon reception of the shortest relevant prefix of the input tree on the stream. In general, deciding if a prefix is sufficient for the selection of a n-tuple is DEXPTIME-complete (even for n = 0). For queries defined by deterministic STAs, this decision problem is in polynomial time combined complexity, as implemented in our earliest query answering algorithm.

XQuery Streaming by Forest Transducers

by Shizuya Hakuta, Sebastian Maneth, Keisuke Nakano, Hideya Iwasaki - IN: ICDE , 2013
"... Streaming of XML transformations is a challenging task and only a few existing systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortc ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Streaming of XML transformations is a challenging task and only a few existing systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principled approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs — indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then use an existing streaming engine for MFTs and apply a well-founded set of optimizations from functional programming such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX’s efficiency is due to clever and explicit buffer management.

A functional language for hyperstreaming XSLT. Unpublished manuscript available at http://researchers.lille.inria.fr/ niehren/Papers/X-Fun/0.pdf

by Pavel Labath , Joachim Niehren , 2013
"... Abstract The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML. Therefore, the current approach of the XSLT working of the W3C is to provide streaming support only for a smaller fragment of X ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML. Therefore, the current approach of the XSLT working of the W3C is to provide streaming support only for a smaller fragment of XSLT 3.0. This has the drawback that many existing XSLT programs need to be rewritten in order to become executable on XML streams, while many others cannot be rewritten at all, since defining nonstreamble transformations. In this paper, we propose a new hyperstreaming approach that does not require any a priori restrictions. The model of hyperstreaming generalizes on the model of streaming by adding shredding operations for the output stream, so that its parts may be plugged together later on. Many transformations such as flips of document pairs are hyperstreamable but not streamable. We then present the functional language X-Fun for defining transformations between XML data trees, while providing shredding instructions. X-Fun can be understood as an extension of Frisch's XSTREAM language with output shredding, while pattern matching is replaced by tree navigation with XPATH expressions. We provide a compiler from XSLT into a fragment of X-Fun, which can be considered as the core of XSLT. We then present a hyperstreaming algorithm for evaluating X-Fun programs which combines a recent XPath evaluator with a traditional functional programming engine. We have implemented a hyperstreaming evaluator for X-Fun and thus for XSLT and compare it experimentally with SAXON's XSLT implementation. It turns out that many XSLT programs become hyperstreamable with good efficiency and without any manual rewriting. Note from February 2014: The first contribution of this report is the definition of X-Fun. This definition is outdated meanwhile, since X-Fun evolved at lot. See our follow-up paper at http://hal.inria.fr/hal-00954692. The second contribution on hyperstreaming is not described there though.
(Show Context)

Citation Context

...mplementation. It turns out that many XSLT programs become hyperstreamable with good efficiency and without any manual rewriting. Note from February 2014: The first contribution of this report is the definition of X-Fun. This definition is outdated meanwhile, since X-Fun evolved at lot. See our follow-up paper at http://hal.inria.fr/hal-00954692. The second contribution on hyperstreaming is not described there though. 1. Introduction The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML [1, 10, 13, 15, 17, 19, 22, 24, 25]. The most used programming language for defining ∗ This research was supported in part by the grant VEGA 1/0979/12 [Copyright notice will appear here once ’preprint’ option is removed.] transformations of data trees are XSLT, JAVASCRIPT for JSON [5], NOSQL languages [4], CDUCE [3], beside many others. Memory efficiency is essential for processing data trees of several giga bytes that do not fit into main memory, while time efficiency is important too. However, some transformations cannot be streamed with a bounded memory. An example is the insertion of table of contents into a book: book(t)⇒ ...

Semantic Query Optimization for Processing XML Streams with Minimized Memory Footprint

by Ming Li , 2007
"... XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evalu-ation over XML streams requires the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also c ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evalu-ation over XML streams requires the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conform-ing to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to release buffered data earlier or possibly avoid ever storing it, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimiza-
(Show Context)

Citation Context

...RETURN” structure and so on. In such a query with deep nesting structure, execution without an efficient buffer strategy might be very costly. 61sChapter 8 Related Work Projecting XML [MS03] [BCCN06] =-=[SSK07]-=- aimed to address the problem of reducing memory by pre-filtering the data from the input stream based on the paths from the query. [BGKS03] utilized a pre-computed index to reduce the memory and CPU ...

Optimisation de Mises à jour XML par typage et projection

by Nicole Bidoit, Dario Colazzo, Noor Malla, Marina Sahakyan
"... Abstract. La projection est une des techniques utilisées pour permettre de réduire les besoins en terme de taille mémoire nécessaire aux moteurs de requêtes XML "in-memory". L’idée sous-jacente à cette technique est simple: étant donnée une requête Q à évaluer sur un document XML D, au lie ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. La projection est une des techniques utilisées pour permettre de réduire les besoins en terme de taille mémoire nécessaire aux moteurs de requêtes XML "in-memory". L’idée sous-jacente à cette technique est simple: étant donnée une requête Q à évaluer sur un document XML D, au lieu de procéder au calcul des réponses de Q sur D, la requête Q est évaluée sur un document D′, plus petit que D, obtenu lors du chargement de dernier en mémoire, par élagage des parties de D qui ne sont pas utiles pour Q. Le document Q ′ qui celui sur lequel la requête Q est calculée, est une projection du document initial Q. Il est souvent de taille drastiquement inférieure à celle de Q. Ceci est dû au fait que les requêtes sont en général très sélectives. Alors que cette technique de projection a été étudiée et développée assez largement pour l’interrogation de document XML, à notre connaissance, ce type de technique n’a pas été explorée ni appliquée aux mises à jour de documents XML. L’objet de cet article est donc de proposer une technique d’optimisation de mises à jour de documents au format XML, exploitant le typage des documents. XML projection is one of the main adopted optimization techniques for reducing memory consumption in XQuery in-memory engines. The main idea behind this technique is quite simple: given a query Q over an XML document D, instead of evaluating Q on D, the query Q is evaluated on a smaller document D ′ obtained from D by pruning out, at loading-time, parts of D that are unrelevant for Q. The actual queried document D ′ is a projection of the original one, and is often much smaller than D due to the fact that queries tend to be quite selective in general. While pro-jection techniques have been extensively investigated for XML querying, we are not aware of applications to XML updating. The purpose of the paper is to investigate a projection based optimization mechanism for updates. 1
(Show Context)

Citation Context

...ction of the original one, and is often much smaller than D due to the fact that queries tend to be quite selective in general. In order to determine an optimal projection D′ several approaches exist =-=[7, 9, 15, 16]-=-, and most of them are based on query path extraction: all the paths occurring in Q, and expressing the real data-needs for the query, are first extracted and then used to build the projection D′. In ...

Efficiently loading and processing XML streams

by Ming Li , Murali Mani , Elke A Rundensteiner - In IDEAS ’08: Proceedings of the 2008 international symposium on Database engineering and applications , 2008
"... ABSTRACT XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate tokenbased stream processing using an algebraic framework where both tokens and tuples are modeled in a ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
ABSTRACT XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate tokenbased stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.
(Show Context)

Citation Context

.... After the loading period, XQueries can then be evaluated using the wellstudied algebraic query evaluation methods on persistent XML data. However, this two-phase approach is not performing the query processing on the fly, rather it completely divides the pattern retrieval and the other functionalities in the query. Such two-step solution does not accommodate for the streaming XML processing applications like network data filtering and dissemination, which requires real-time continuous query response. Another approach towards processing XML streams is what we call on-the-fly query evaluation [2,3,5,7,8,11,12,16,17]. In this approach: (1) the pattern retrieval is done concurrently while the streaming input is being processed and (2) Whenever the result data is ready to be constructed and output, the query engine immediately generates query results. Several recent research projects have addressed on-the-fly XQuery evaluation over streams using an automaton-based model [2,5,7,8]. The automata technique, typically used for matching patterns over strings, is a natural paradigm for structural pattern retrieval on XML token streams [4,8]. However the automata model suffers from not being able to strike a balan...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University