Results 1 - 10
of
11
Regular Expression Types for XML
, 2003
"... We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type sy ..."
Abstract
-
Cited by 157 (18 self)
- Add to MetaCart
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing. The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be exptime-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy’s set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.
XDuce: A Statically Typed XML Processing Language
, 2002
"... this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating ..."
Abstract
-
Cited by 127 (5 self)
- Add to MetaCart
this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose types---called regular expression types---correspond to document schemas. The motivating principle behind its design is that a simple, clean, and powerful type system for XML processing can be based directly on the theory of regular tree automata
XDuce: A Typed XML Processing Language
- In Proc. of Workshop on the Web and Data Bases (WebDB
, 2000
"... this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced "transduce "). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expr ..."
Abstract
-
Cited by 122 (7 self)
- Add to MetaCart
this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced "transduce "). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expression types and a corresponding mechanism for regular expression pattern matching. Regular expression types are a natural generalization of DTDs, describing, as DTDs do, structures in XML documents using regular expression operators (i.e., *, ?, ---, etc.). Moreover, regular expression types support a simple but powerful notion of subtyping, yielding a substantial degree of flexibility in programming. Regular expression pattern matching is similar to ML pattern matching except that regular expression types can be embedded in patterns, which allows even more flexible matching.
Visibly pushdown languages
, 2004
"... Abstract. We study congruences on words in order to characterize the class of visibly pushdown languages (Vpl), a subclass of context-free languages. For any language L, we define a natural congruence on words that resembles the syntactic congruence for regular languages, such that this congruence i ..."
Abstract
-
Cited by 99 (14 self)
- Add to MetaCart
Abstract. We study congruences on words in order to characterize the class of visibly pushdown languages (Vpl), a subclass of context-free languages. For any language L, we define a natural congruence on words that resembles the syntactic congruence for regular languages, such that this congruence is of finite index if, and only if, L is a Vpl. We then study the problem of finding canonical minimal deterministic automata for Vpls. Though Vpls in general do not have unique minimal automata, we consider a subclass of VPAs called k-module single-entry VPAs that correspond to programs with recursive procedures without input parameters, and show that the class of well-matched Vpls do indeed have unique minimal k-module single-entry automata. We also give a polynomial time algorithm that minimizes such k-module single-entry VPAs. 1 Introduction The class of visibly pushdown languages (Vpl), introduced in [1], is a subclassof context-free languages accepted by pushdown automata in which the input letter determines the type of operation permitted on the stack. Visibly push-down languages are closed under all boolean operations, and problems such as inclusion, that are undecidable for context-free languages, are decidable for Vpl. Vpls are relevant to several applications that use context-free languages suchas the model-checking of software programs using their pushdown models [1-3]. Recent work has shown applications in other contexts: in modeling semanticsof effects in processing XML streams [4], in game semantics for programming languages [5], and in identifying larger classes of pushdown specifications thatadmit decidable problems for infinite games on pushdown graphs [6].
Adding nesting structure to words
- In Developments in Language Theory, LNCS 4036
, 2006
"... We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linear-hierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Neste ..."
Abstract
-
Cited by 42 (10 self)
- Add to MetaCart
We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linear-hierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Nested words generalize both words and ordered trees, and allow both word and tree operations. We define nested word automata—finite-state acceptors for nested words, and show that the resulting class of regular languages of nested words has all the appealing theoretical properties that the classical regular word languages enjoys: deterministic nested word automata are as expressive as their nondeterministic counterparts; the class is closed under union, intersection, complementation, concatenation, Kleene-*, prefixes, and language homomorphisms; membership, emptiness, language inclusion, and language equivalence are all decidable; and definability in monadic second order logic corresponds exactly to finite-state recognizability. We also consider regular languages of infinite nested words and show that the closure properties, MSO-characterization, and decidability of decision problems carry over. The linear encodings of nested words give the class of visibly pushdown languages of words, and this class lies between balanced languages and deterministic context-free languages. We argue that for algorithmic verification of structured programs, instead of viewing the program as a context-free language over words, one should view it as a regular language of nested words (or equivalently, a visibly pushdown language), and this would allow model checking of many properties (such as stack inspection, pre-post conditions) that are not expressible in existing specification logics. We also study the relationship between ordered trees and nested words, and the corresponding automata: while the analysis complexity of nested word automata is the same as that of classical tree automata, they combine both bottom-up and top-down traversals, and enjoy expressiveness and succinctness benefits over tree automata. 1
Smoothing and Compression with Stochastic k-testable Tree Languages ⋆ Abstract
"... In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.
Streaming tree transducers
- CoRR
"... Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressiv ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressive model for transforming unranked ordered trees (and hedges) in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the output in linear time using a finite-state control, a visibly pushdown stack, and a finite number of variables that store output chunks that can be combined using the operations of string-concatenation and tree-insertion. We prove that the expressiveness of the model coincides with transductions definable using monadic second-order logic (MSO). Existing models of tree transducers either cannot implement all MSO-definable transformations, or require regular look ahead that prohibits single-pass implementation. We show a variety of analysis problems such as type-checking and checking functional equivalence are decidable for our model. 1
Compositionality of Handlers on Intermediate Servers in a Web Service Architecture
, 2006
"... This research addresses the problem of inter-operability among different components in a Web service architecture on the Internet. “Message Handlers ” are pieces of code that intercept the message between clients and services in order to create/modify different pieces of the message, or to perform a ..."
Abstract
- Add to MetaCart
This research addresses the problem of inter-operability among different components in a Web service architecture on the Internet. “Message Handlers ” are pieces of code that intercept the message between clients and services in order to create/modify different pieces of the message, or to perform a processing task. For several reasons, such as management, reusability, optimization, routing, providing value-added services and transformations, it might be desirable to have chains of handlers on intermediate servers on the path between the client and the server; however, automatic selection of the set of handlers to be executed and their order of execution is still a problem. We try to solve this problem based on the type of the messages that are sent by the clients and the expected server input type. We leverage two algorithms based on formal language theory in order to solve the handler composition problem. Our case studies show how our approach could help in solving the problem of schema migration and version incompatibility in a large system such as eBay. ii Table of Contents Abstract................................... ii
The University of Tokyo- Japan
"... Despite the extensiveness of recent investigations on static typing for XML, parametric polymorphism has rarely been treated. This well-established typing discipline can also be useful in XML processing in particular for programs involving “parametric schemas, ” i.e., schemas parameterized over othe ..."
Abstract
- Add to MetaCart
Despite the extensiveness of recent investigations on static typing for XML, parametric polymorphism has rarely been treated. This well-established typing discipline can also be useful in XML processing in particular for programs involving “parametric schemas, ” i.e., schemas parameterized over other schemas (e.g., SOAP). The difficulty in treating polymorphism for XML lies in how to extend the “semantic ” approach used in the mainstream (monomorphic) XML type systems. A naive extension would be “semantic ” quantification over all substitutions for type variables. However, this approach reduces to an NEXPTIME-complete problem for which no practical algorithm is known and induces a subtyping relation that may not always match the programmer’s intuition. In this paper, we propose a different method that smoothly extends the semantic approach yet is algorithmically easier. The key idea here is to devise a novel and simple marking technique, where we interpret a polymorphic type as a set of values with annotations of which subparts are parameterized. We exploit this interpretation in every ingredient of our polymorphic type system such as subtyping, inference of type arguments, etc. As a result, we achieve a sensible system that types are never reconstructed”—in a reminiscence of Reynold’s parametricity theory. Also, we obtain a set of practical algorithms for typechecking by local modifications to existing ones for a monomorphic system.
On The Power Of Distributed Bottom-up Tree Automata
"... Abstract — Tree automata have been defined to accept trees. Different types of acceptance like bottom-up, top-down, tree walking have been considered in the literature. In this paper, we consider bottom-up tree automata and discuss the sequential distributed version of this model. Generally, this ty ..."
Abstract
- Add to MetaCart
Abstract — Tree automata have been defined to accept trees. Different types of acceptance like bottom-up, top-down, tree walking have been considered in the literature. In this paper, we consider bottom-up tree automata and discuss the sequential distributed version of this model. Generally, this type of distribution is called cooperative distributed automata or the blackboard model. We define the traditional five modes of cooperation, viz. ∗-mode, t-mode, = k, ≥ k, ≤ k (k ≥ 1) modes on bottom-up tree automata. We discuss the accepting power of cooperative distributed tree automata under these modes of cooperation. We find that the ∗-mode does not increase the power, whereas the other modes increase the power. We discuss a few results comparing the acceptance power under different modes of cooperation.

