Results 1 - 10
of
76
Regular Expression Types for XML
, 2003
"... We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type sy ..."
Abstract
-
Cited by 157 (18 self)
- Add to MetaCart
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing. The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be exptime-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy’s set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.
Typechecking for XML Transformers
- IN PROCEEDINGS OF THE NINETEENTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS
, 2000
"... ..."
XDuce: A Statically Typed XML Processing Language
, 2002
"... this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating ..."
Abstract
-
Cited by 127 (5 self)
- Add to MetaCart
this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating principle behind its design is that a simple, clean, and powerful type system for XML processing can be based directly on the theory of regular tree automata
Regular expression pattern matching for XML
, 2003
"... We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of ..."
Abstract
-
Cited by 104 (10 self)
- Add to MetaCart
We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We then show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages with type systems based on regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the type of input values. The type inference algorithm translates types and patterns into regular tree automata, and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and precision of the analysis. We address these issues by introducing a data structure representing these closure operations
Typechecking for Semistructured Data
- SIGACT News
, 2001
"... look for the tags they need, and ignore all the others. At a more detailed level, an agreement should also constrain the structure. For example a data item with a price tag must contain an integer value, while a data item with a product tag must contain nested items labeled name, price, and descrip ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
look for the tags they need, and ignore all the others. At a more detailed level, an agreement should also constrain the structure. For example a data item with a price tag must contain an integer value, while a data item with a product tag must contain nested items labeled name, price, and description. An agreement on the structure enables applications to navigate the data in a meaningful way. We call a collection of constraints on the structure a type. Several type formalism have been proposed for semistructured data [BDFS97, GW97, BM99], and several are considered for XML [Con98, BLM + 99, BFRW84]. There is an obvious analogy between types in semistructured data types in programming languages. But there is an important dierence. The former are global constraints on the data, while the latter are local constraints. For example, if a
A Transducer-Based XML Query Processor
, 2002
"... The XML Stream Machine (XSM) system is a novel XQuery processing paradigm that is tuned to the efficient processing of sequentially accessed XML data (streams). The system compiles a given XQuery into an XSM, which is an XML stream transducer, i.e., an abstract device that takes as input one o ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
The XML Stream Machine (XSM) system is a novel XQuery processing paradigm that is tuned to the efficient processing of sequentially accessed XML data (streams). The system compiles a given XQuery into an XSM, which is an XML stream transducer, i.e., an abstract device that takes as input one or more XML data streams and produces one or more output streams, potentially using internal buffers. We present a systematic way to translate XQueries into efficient XSMs: First the XQuery is translated into a network of XSMs that correspond to the basic operators of the XQuery language and exchange streams. The network is reduced to a single XSM by repeated application of an XSM composition operation that is optimized to reduce the number of tests and actions that the XSM performs as well as the number of intermediate buffers that it uses. Finally, the optimized XSM is compiled into a C program. First empirical results illustrate the performance benefits of the XSM-based processor.
Validating Streaming XML Documents
, 2002
"... This paper investigates the on-line validation of streaming XML documents with respect to a DTD, under memory constraints. We first consider validation using constant memory, formalized by a finite-state automaton (fsa). We examine two flavors of the problem, depending on whether or not the XML doc ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
This paper investigates the on-line validation of streaming XML documents with respect to a DTD, under memory constraints. We first consider validation using constant memory, formalized by a finite-state automaton (fsa). We examine two flavors of the problem, depending on whether or not the XML document is assumed to be well-formed. The main results of the paper provide conditions on the DTDs under which validation of either flavor can be done using an fsa. For DTDs that cannot be validated by an fsa, we investigate two alternatives. The first relaxes the constant memory requirement by allowing a stack bounded in the depth of the XML document, while maintaining the deterministic, one-pass requirement. The second approach consists in refining the DTD to provide additional information that allows validation by an fsa.
Containment for XPath Fragments under DTD Constraints
- In ICDT
, 2003
"... The containment and equivalence problems for various fragments of XPath have been studied by a number of authors. For some fragments, deciding containment (and even minimisation) has been shown to be in ptime, while for minor extensions containment has been shown to be conp-complete. When contai ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
The containment and equivalence problems for various fragments of XPath have been studied by a number of authors. For some fragments, deciding containment (and even minimisation) has been shown to be in ptime, while for minor extensions containment has been shown to be conp-complete. When containment is with respect to trees satisfying a set of constraints (such as a schema or DTD), the problem seems to be more difficult. For example, containment under DTDs is conp- complete for an XPath fragment denoted XP for which containment is in ptime. It is also undecidable for a larger class of XPath queries when the constraints are so-called simple XPath integrity constraints (SXICs).

