Results 1  10
of
13
Regular Expression Types for XML
, 2003
"... We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (), etc., to describe XML documents. The novelty of our type sy ..."
Abstract

Cited by 177 (20 self)
 Add to MetaCart
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing. The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be exptimecomplete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a topdown traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy’s setinclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.
XDuce: A Statically Typed XML Processing Language
, 2002
"... this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose typescalled regular expression typescorrespond to document schemas. The motivating ..."
Abstract

Cited by 146 (7 self)
 Add to MetaCart
this paper we describe a statically typed XML processing language called XDuce (o#cially pronounced "transduce"). XDuce is a functional language whose primitive data structures represent XML documents and whose typescalled regular expression typescorrespond to document schemas. The motivating principle behind its design is that a simple, clean, and powerful type system for XML processing can be based directly on the theory of regular tree automata
Visibly pushdown languages
, 2004
"... Abstract. We study congruences on words in order to characterize the class of visibly pushdown languages (Vpl), a subclass of contextfree languages. For any language L, we define a natural congruence on words that resembles the syntactic congruence for regular languages, such that this congruence i ..."
Abstract

Cited by 131 (15 self)
 Add to MetaCart
Abstract. We study congruences on words in order to characterize the class of visibly pushdown languages (Vpl), a subclass of contextfree languages. For any language L, we define a natural congruence on words that resembles the syntactic congruence for regular languages, such that this congruence is of finite index if, and only if, L is a Vpl. We then study the problem of finding canonical minimal deterministic automata for Vpls. Though Vpls in general do not have unique minimal automata, we consider a subclass of VPAs called kmodule singleentry VPAs that correspond to programs with recursive procedures without input parameters, and show that the class of wellmatched Vpls do indeed have unique minimal kmodule singleentry automata. We also give a polynomial time algorithm that minimizes such kmodule singleentry VPAs. 1 Introduction The class of visibly pushdown languages (Vpl), introduced in [1], is a subclassof contextfree languages accepted by pushdown automata in which the input letter determines the type of operation permitted on the stack. Visibly pushdown languages are closed under all boolean operations, and problems such as inclusion, that are undecidable for contextfree languages, are decidable for Vpl. Vpls are relevant to several applications that use contextfree languages suchas the modelchecking of software programs using their pushdown models [13]. Recent work has shown applications in other contexts: in modeling semanticsof effects in processing XML streams [4], in game semantics for programming languages [5], and in identifying larger classes of pushdown specifications thatadmit decidable problems for infinite games on pushdown graphs [6].
XDuce: A Typed XML Processing Language
 In Proc. of Workshop on the Web and Data Bases (WebDB
, 2000
"... this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced "transduce "). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expr ..."
Abstract

Cited by 127 (7 self)
 Add to MetaCart
this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced "transduce "). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expression types and a corresponding mechanism for regular expression pattern matching. Regular expression types are a natural generalization of DTDs, describing, as DTDs do, structures in XML documents using regular expression operators (i.e., *, ?, , etc.). Moreover, regular expression types support a simple but powerful notion of subtyping, yielding a substantial degree of flexibility in programming. Regular expression pattern matching is similar to ML pattern matching except that regular expression types can be embedded in patterns, which allows even more flexible matching.
Adding nesting structure to words
 In Developments in Language Theory, LNCS 4036
, 2006
"... We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linearhierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Neste ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linearhierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Nested words generalize both words and ordered trees, and allow both word and tree operations. We define nested word automata—finitestate acceptors for nested words, and show that the resulting class of regular languages of nested words has all the appealing theoretical properties that the classical regular word languages enjoys: deterministic nested word automata are as expressive as their nondeterministic counterparts; the class is closed under union, intersection, complementation, concatenation, Kleene*, prefixes, and language homomorphisms; membership, emptiness, language inclusion, and language equivalence are all decidable; and definability in monadic second order logic corresponds exactly to finitestate recognizability. We also consider regular languages of infinite nested words and show that the closure properties, MSOcharacterization, and decidability of decision problems carry over. The linear encodings of nested words give the class of visibly pushdown languages of words, and this class lies between balanced languages and deterministic contextfree languages. We argue that for algorithmic verification of structured programs, instead of viewing the program as a contextfree language over words, one should view it as a regular language of nested words (or equivalently, a visibly pushdown language), and this would allow model checking of many properties (such as stack inspection, prepost conditions) that are not expressible in existing specification logics. We also study the relationship between ordered trees and nested words, and the corresponding automata: while the analysis complexity of nested word automata is the same as that of classical tree automata, they combine both bottomup and topdown traversals, and enjoy expressiveness and succinctness benefits over tree automata. 1
Parametric polymorphism for XML
 In POPL ’05, 32nd ACM Symposium on Principles of Programming Languages
, 2005
"... Despite the extensiveness of recent investigations on static typing for XML, parametric polymorphism has rarely been treated. This wellestablished typing discipline can also be useful in XML processing in particular for programs involving “parametric schemas, ” i.e., schemas parameterized over othe ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
Despite the extensiveness of recent investigations on static typing for XML, parametric polymorphism has rarely been treated. This wellestablished typing discipline can also be useful in XML processing in particular for programs involving “parametric schemas, ” i.e., schemas parameterized over other schemas (e.g., SOAP). The difficulty in treating polymorphism for XML lies in how to extend the “semantic ” approach used in the mainstream (monomorphic) XML type systems. A naive extension would be “semantic ” quantification over all substitutions for type variables. However, this approach reduces to an NEXPTIMEcomplete problem for which no practical algorithm is known. In this paper, we propose a different method that smoothly extends the semantic approach yet is algorithmically easier. In this, we devise a novel and simple marking technique, where we interpret a polymorphic type as a set of values with annotations of which subparts are parameterized. We exploit this interpretation in every ingredient of our polymorphic type system such as subtyping, inference of type arguments, and so on. As a result, we achieve a sensible system that directly represents a usual expected behavior of polymorphic type systems—“values of variable types are never reconstructed”—in a reminiscence of Reynold’s parametricity theory. Also, we obtain a set of practical algorithms for typechecking by local modifications to existing ones for a monomorphic system.
Smoothing and Compression with Stochastic ktestable Tree Languages ⋆ Abstract
"... In this paper, we describe some techniques to learn probabilistic ktestable tree models, a generalization of the well known kgram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
In this paper, we describe some techniques to learn probabilistic ktestable tree models, a generalization of the well known kgram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backingoff schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than stringbased methods.
Streaming tree transducers
 CoRR
"... Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressiv ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressive model for transforming unranked ordered trees (and hedges) in a single pass. Given a linear encoding of the input tree, the transducer makes a single lefttoright pass through the input, and computes the output in linear time using a finitestate control, a visibly pushdown stack, and a finite number of variables that store output chunks that can be combined using the operations of stringconcatenation and treeinsertion. We prove that the expressiveness of the model coincides with transductions definable using monadic secondorder logic (MSO). Existing models of tree transducers either cannot implement all MSOdefinable transformations, or require regular look ahead that prohibits singlepass implementation. We show a variety of analysis problems such as typechecking and checking functional equivalence are decidable for our model. 1
Foundations of XML Processing
, 2007
"... 1.1 Documents, Schemas, and Schema Languages........... 7 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
1.1 Documents, Schemas, and Schema Languages........... 7
Toward a More Complete Alloy ⋆,⋆⋆
"... Abstract. Many modelfinding tools, such as Alloy, charge users with providing bounds on the sizes of models. It would be preferable to automatically compute sufficient upperbounds whenever possible. The BernaysSchönfinkelRamsey fragment of firstorder logic can relieve users of this burden in so ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Many modelfinding tools, such as Alloy, charge users with providing bounds on the sizes of models. It would be preferable to automatically compute sufficient upperbounds whenever possible. The BernaysSchönfinkelRamsey fragment of firstorder logic can relieve users of this burden in some cases: its sentences are satisfiable iff they are satisfied in a finite model, whose size is computable from the input problem. Researchers have observed, however, that the class of sentences for which such a theorem holds is richer in a manysorted framework—which Alloy inhabits— than in the onesorted case. This paper studies this phenomenon in the general setting of ordersorted logic supporting overloading and empty sorts. We establish a syntactic condition generalizing the BernaysSchönfinkelRamsey form that ensures the Finite Model Property. We give a lineartime algorithm for deciding this condition and a polynomialtime algorithm for computing the bound on model sizes. As a consequence, modelfinding is a complete decision procedure for sentences in this class. Our work has been incorporated into Margrave, a tool for policy analysis, and applies in realworld situations. 1