Results 1 - 10
of
11
The Design Space of Type Checkers for XML Transformation Languages
, 2004
"... We survey work on statically type checking XML transformations, covering a wide range of notations and ambitions. The concept of type may vary from idealizations of DTD to full-blown XML Schema or even more expressive formalisms. The notion of transformation may vary from clean and simple transd ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
We survey work on statically type checking XML transformations, covering a wide range of notations and ambitions. The concept of type may vary from idealizations of DTD to full-blown XML Schema or even more expressive formalisms. The notion of transformation may vary from clean and simple transductions to domain-specific languages or integration of XML in general-purpose programming languages. Type annotations can be either explicit or implicit, and type checking ranges from exact decidability to pragmatic approximations. We characterize
Efficient memory representation of XML documents
- In DBPL
, 2005
"... Abstract. Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Abstract. Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing ” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations. 1
The Xtatic experience
- University of Pennsylvania
, 2005
"... Xtatic is a lightweight extension of C ♯ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types in the style of XDuce, and “tree grep”-style regular patterns for traversing and manipulating XML. Previous papers ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Xtatic is a lightweight extension of C ♯ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types in the style of XDuce, and “tree grep”-style regular patterns for traversing and manipulating XML. Previous papers on Xtatic have reported results on a number of specific technical issues: basic theoretical properties of an idealized core language, novel compilation algorithms for regular pattern matching, and efficient runtime support for XML processing in the style encouraged by Xtatic. The aim of the present paper is to discuss Xtatic—less formally and more holistically—from the perspective of language design. We survey the most significant issues we faced in the design process and evaluate the choices we have made in addressing them. <person> <name>Haruo Hosoya</name>
Type-based optimization for regular patterns
- In First International Workshop on High Performance XML Processing
, 2005
"... Pattern matching mechanisms based on regular tree expressions feature in a number of recent languages for processing XML. The flexibility of these mechanisms demands novel approaches to the familiar problems of pattern-match compilation—how to minimize the number of tests performed during matching w ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Pattern matching mechanisms based on regular tree expressions feature in a number of recent languages for processing XML. The flexibility of these mechanisms demands novel approaches to the familiar problems of pattern-match compilation—how to minimize the number of tests performed during matching while keeping the size of the output code small. We describe semantic compilation methods in which we use the schema of the value flowing into a pattern matching expression to generate efficient target code. We start by discussing a pragmatic algorithm used currently in the compiler of Xtatic and report some preliminary performance results. For a more fundamental analysis, we define an optimality criterion of “no useless tests ” and show that it is not satisfied by Xtatic’s algorithm. We constructively demonstrate that the problem of generating optimal pattern matching code is decidable for finite (non-recursive) patterns. 1
Statically typed document transformation: An Xtatic experience
- BRICS, Department of Computer Science, University of Aarhus
, 2006
"... XTATIC is a lightweight extension of C ♯ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types à la XDUCE, and regular patterns for investigating and manipulating XML. We describe our experiences using XTATIC i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
XTATIC is a lightweight extension of C ♯ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types à la XDUCE, and regular patterns for investigating and manipulating XML. We describe our experiences using XTATIC in a real-world application: a program for transforming XMLSPEC, a format used for authoring W3C technical reports, into HTML. Our implementation closely follows an existing one written in XSLT, facilitating comparison of the two languages and analysis of the costs and benefits—both significant—of rich static typing for XML-intensive code.
Paths into patterns
, 2004
"... The XML Path Language (XPath) is an industry standard notation for addressing parts of an XML document. It is supported by many XML processing libraries and has been used as the foundation for several dedicated XML processing languages. Regular patterns, an alternative way of investigating and destr ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The XML Path Language (XPath) is an industry standard notation for addressing parts of an XML document. It is supported by many XML processing libraries and has been used as the foundation for several dedicated XML processing languages. Regular patterns, an alternative way of investigating and destructing XML documents, were first proposed in the XDuce language and feature in a number of its descendants. The processing styles offered by XPath and by regular patterns are each quite convenient for certain sorts of tasks, and the designer of a future XML processing language might well like to provide both. This designer might wonder, however, to what extent these mechanisms can be based on a common foundation. Can one be implemented by translating it into the other? Can aspects of both be combined into a single notation? As a first step toward addressing these questions, we show in this paper that a language closely related to the “downward axis ” fragment of XPath can be accurately translated into ambiguous XDuce-style regular patterns with a “collect all matches ” interpretation. 1
Revealing the X/O impedance mismatch (Changing lead into gold)
- IN DATATYPE-GENERIC PROGRAMMING, VOLUME 4719 OF LNCS
, 2007
"... We take the term X/O impedance mismatch to describe the difficulty of the OO paradigm to accommodate XML processing by means of recasting it to typed OO programming. In particular, given XML types (say, XML schemas), it is notoriously difficult to map them automatically to object types (say, object ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We take the term X/O impedance mismatch to describe the difficulty of the OO paradigm to accommodate XML processing by means of recasting it to typed OO programming. In particular, given XML types (say, XML schemas), it is notoriously difficult to map them automatically to object types (say, object models) that (i) reasonably compare to native object types typically devised by OO developers; (ii) fully preserve the intent of the original XML types; (iii) fully support round-tripping of arbitrary, valid XML data; and (iv) provide a general and convenient programming model for XML data hosted by objects. We reveal the X/O impedance mismatch in particular detail. That is, we survey the relevant differences between XML and objects in terms of their data models and their type systems. In this process, we systematically record and assess Xto-O mapping options. Our illustrations employ XSD (1.0) as the XML-schema language of choice and C# (1.0–3.0) as the bound of OO language expressiveness.
Type-safe Computation with Heterogeneous Data
, 2007
"... Computation with large-scale heterogeneous data typically requires universal traversal to search forall occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Computation with large-scale heterogeneous data typically requires universal traversal to search forall occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general-purpose programming languages, because these languages are designed for homogeneous data and have problems typing the different substructures in heterogeneous data, and the complex patterns to match with the substructures. Programmers either have to hard-code the structures and search patterns, preventing programs from being reusable and scalable, or have to use low-level untyped programming or programming with special-purpose query languages, opening the door to type mismatches that cause a high risk of program correctness and security problems. This thesis invents the concept of pattern structures, and proposes a general solution to the above problems -- a programming technique using pattern structures. In this solution, well-typed pattern structures are defined to represent complex search patterns, and pattern searching over heterogeneous data is programmed with pattern parameters, in a statically-typed language that supports first-class typing of structures and patterns. The resulting programs are statically-typed, highly reusable for different data structures and different patterns, and highly scalable in terms of the complexity of data structures and patterns. Adding new kinds of patterns for an application no longer requires changing the language in use or creating new ones, but is only a programming task. The thesis demonstrates the application of this approach to, and its advantages in, two important examples of computation with heterogeneous data, i.e., XML data processing and Java bytecode analysis.
ML 2005 Preliminary Version A Type-Safe Embedding of XDuce into ML
"... Key words: Semi-structured data handling, programtransformation, language integration/extension. 1 Introduction There has been some notable interest in making use of typed programming languages for XML processing [16,13,3,11]. The advantages of such an approach are clear. In a typed setting we can p ..."
Abstract
- Add to MetaCart
Key words: Semi-structured data handling, programtransformation, language integration/extension. 1 Introduction There has been some notable interest in making use of typed programming languages for XML processing [16,13,3,11]. The advantages of such an approach are clear. In a typed setting we can provide some static guarantees about the well-formedness of XML documents and transformations. Previous work can be roughly divided into two categories.

