Results 1 - 10
of
35
XDuce: A Statically Typed XML Processing Language
, 2002
"... this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating ..."
Abstract
-
Cited by 127 (5 self)
- Add to MetaCart
this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating principle behind its design is that a simple, clean, and powerful type system for XML processing can be based directly on the theory of regular tree automata
Regular expression pattern matching for XML
, 2003
"... We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of ..."
Abstract
-
Cited by 104 (10 self)
- Add to MetaCart
We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We then show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages with type systems based on regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the type of input values. The type inference algorithm translates types and patterns into regular tree automata, and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and precision of the analysis. We address these issues by introducing a data structure representing these closure operations
Querying unranked trees with stepwise tree automata
- Intenational Conf. on Rewriting Techniques and Applications
, 2004
"... Abstract. The problem of selecting nodes in unranked trees is the most basic querying problem for XML. We propose stepwise tree automata for querying unranked trees. Stepwise tree automata can express the same monadic queries as monadic Datalog and monadic second-order logic. We prove this result by ..."
Abstract
-
Cited by 30 (12 self)
- Add to MetaCart
Abstract. The problem of selecting nodes in unranked trees is the most basic querying problem for XML. We propose stepwise tree automata for querying unranked trees. Stepwise tree automata can express the same monadic queries as monadic Datalog and monadic second-order logic. We prove this result by reduction to the ranked case, via a new systematic correspondence that relates unranked and ranked queries. 1
Extensions of Attribute Grammars for Structured Document Queries
, 1999
"... Document specification languages like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of a ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Document specification languages like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document.
Query Automata
- In Proceedings of the Eighteenth ACM Symposium on Principles of Database Systems
, 1999
"... A main task in document transformation and information retrieval is locating subtrees satisfying some pattern. Therefore, unary queries, i.e., queries that map a tree to a set of its nodes, play an important role in the context of structured document databases. We want to understand how the natu ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
A main task in document transformation and information retrieval is locating subtrees satisfying some pattern. Therefore, unary queries, i.e., queries that map a tree to a set of its nodes, play an important role in the context of structured document databases. We want to understand how the natural and well-studied computation model of tree automata can be used to compute such queries. We dene a query automaton (QA) as a deterministic two-way nite automaton over trees that has the ability to select nodes depending on the state and the label at those nodes. We study QAs over ranked as well as over unranked trees. Unranked trees dier from ranked ones in that there is no bound on the number of children of nodes. We characterize the expressiveness of the dierent formalisms as the unary queries denable in monadic second-order logic (MSO). Surprisingly, in contrast to the ranked case, special stay transitions had to be added to QAs over unranked trees to capture MSO. We es...
Counting in Trees for Free
, 2004
"... In [22], it was shown that MSO logic for ordered unranked trees becomes undecidable if Presburger constraints are allowed at children of nodes. We now show that a decidable logic is obtained if we use a a modal fixpoint logic instead. We present an automata theoretic characterization of this logi ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
In [22], it was shown that MSO logic for ordered unranked trees becomes undecidable if Presburger constraints are allowed at children of nodes. We now show that a decidable logic is obtained if we use a a modal fixpoint logic instead. We present an automata theoretic characterization of this logic by means of deterministic Presburger tree automata (PTA) and show how it can be used to express numerical document queries. Surprisingly, the complexity of satisfiability for the extended logic is asymptotically the same as for the original logic. The non-emptiness for PTAs is in general pspace-complete which is moderate given that it is already pspace-hard to test whether the complement of a regular expression is non-empty. We also identify a subclass of PTAs with a tractable non-emptiness problem. Further, to decide whether a tree t satisfies a formula # is polynomial in the size of # and linear in the size of t.
On Diving in Trees
- In Proc. MFCS
"... The paper is concerned with queries on tree-structured data. It defines fragments of first-order logic (FO) and FO extended by regular expressions along paths. These fragments have the same expressive power as the full logics themselves. On the other hand, they can be evaluated reasonably efficient, ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
The paper is concerned with queries on tree-structured data. It defines fragments of first-order logic (FO) and FO extended by regular expressions along paths. These fragments have the same expressive power as the full logics themselves. On the other hand, they can be evaluated reasonably efficient, even if the formula which represents the query is considered as part of the input. The results are first established for unary queries that can only express properties of a vertex that depend solely on the subtree rooted at the vertex. Then they are extended to queries of arbitrary arity that take the whole tree into account. The latter kind of result is also obtained for the corresponding fragment of monadic second order logic that was introduced in [NS00].
Tight lower bounds for query processing on streaming and external memory data
- ICALP
, 2005
"... Abstract. We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sort ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
Abstract. We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, e.g., of size o ( 5 √ n). We also establish tight bounds for the complexity of XPath evaluation and filtering. 1

