Results 1  10
of
140
Efficient Static Analysis of XML Paths and Types
, 2008
"... We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically typecheck XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of ..."
Abstract

Cited by 90 (47 self)
 Add to MetaCart
We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically typecheck XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of a formula. The logic corresponds to the alternation free modal µcalculus without greatest fixpoint, restricted to finite trees, and where formulas are cyclefree. Our proof method is based on two auxiliary results. First, XML regular tree types and XPath expressions have a linear translation to cyclefree formulas. Second, the least and greatest fixpoints are equivalent for finite trees, hence the logic is closed under negation. Building on these results, we describe a practical, effective system for solving the satisfiability of a formula. The system has been experimented with some decision problems such as XPath emptiness, containment, overlap, and coverage, with or without type constraints. The benefit of the approach is that our system can be effectively used in static analyzers for programming languages
Twovariable logic on data trees and XML reasoning
"... Motivated by reasoning tasks for XML languages, the satisfiability problem of logics on data trees is investigated. The nodes of a data tree have a label from a finite set and a data value from a possibly infinite set. It is shown that satisfiability for twovariable firstorder logic is decidable i ..."
Abstract

Cited by 84 (17 self)
 Add to MetaCart
Motivated by reasoning tasks for XML languages, the satisfiability problem of logics on data trees is investigated. The nodes of a data tree have a label from a finite set and a data value from a possibly infinite set. It is shown that satisfiability for twovariable firstorder logic is decidable if the tree structure can be accessed only through the child and the next sibling predicates and the access to data values is restricted to equality tests. From this main result, decidability of satisfiability and containment for a dataaware fragment of XPath and of the implication problem for unary key and inclusion constraints is concluded.
Automata and logics for words and trees over an infinite alphabet
 IN CSL 2006
, 2006
"... In a data word or a data tree each position carries a label from a finite alphabet and a data value from some infinite domain. These models have been considered in the realm of semistructured data, timed automata and extended temporal logics. This paper survey several know results on automata and l ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
(Show Context)
In a data word or a data tree each position carries a label from a finite alphabet and a data value from some infinite domain. These models have been considered in the realm of semistructured data, timed automata and extended temporal logics. This paper survey several know results on automata and logics manipulating data words and data trees, the focus being on their relative expressive power and decidability.
Inference of concise DTDs from XML data
"... XML is the lingua franca for data exchange on the Internet. Within applications or communities, XML data is usually not arbitrary but adheres to some structure possibly imposed by a schema. The advantages offered by the presence of such a schema are numerous. The most direct application is of course ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
XML is the lingua franca for data exchange on the Internet. Within applications or communities, XML data is usually not arbitrary but adheres to some structure possibly imposed by a schema. The advantages offered by the presence of such a schema are numerous. The most direct application is of course automatic validation of the document structure. Input validation, for instance, not only facilitates automatic processing but also ensures soundness of the input data. Indeed, unvalidated input from web requests is considered as the number one vulnerability for web applications. The presence of a schema allows for automation and optimization of search, integration, and processing of XML data. Further, the existence of schemas is imperative when integrating (meta) data through schema matching and in the area of generic model management. A final advantage of a schema is that it assigns meaning to the data. That is, it provides a user with a concrete semantics of the document and aids in the specification of meaningful queries over XML data. Although the examples mentioned here just scrape the surface of current applications, they already underscore the importance of schemas accompanying XML data. References are provided in the original paper. 2 Problem setting Given a collection of XML documents, a schema should be inferred without user intervention and
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
, 2008
"... Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regu ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as koccurrence regular expressions (kOREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns kOREs for increasing values of k, and selects the one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.
On the complexity of XPath containment in the presence of disjunction, DTDs, and variables
 LOGICAL METHODS IN COMPUTER SCIENCE, VOL. 2 (3:1)
, 2006
"... ..."
(Show Context)
Static analysis of Active XML systems
 in PODS, 2008
"... Active XML is a highlevel specification language tailored to dataintensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones ..."
Abstract

Cited by 35 (15 self)
 Add to MetaCart
(Show Context)
Active XML is a highlevel specification language tailored to dataintensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Function calls return documents that may be active, so may activate new subtasks. The focus of the paper is on the verification of temporal properties of runs of Active XML systems, specified in a treepattern based temporal logic, TreeLTL, that allows expressing a rich class of semantic properties of the application. The main results establish the boundary of decidability and the complexity of automatic verification of TreeLTL properties. 1
The complexity of query containment in expressive fragments of XPath 2.0
 In Proc. PODS’07
, 2007
"... (full version, including appendices, of the PODS’07 paper) ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
(full version, including appendices, of the PODS’07 paper)
Optimizing schema languages for XML: Numerical constraints and interleaving
 ICDT
, 2007
"... The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and nonemptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and nonemptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at righthand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleenestar, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.
Information preserving xml schema embedding.
 In VLDB,
, 2005
"... Abstract A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, informatio ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
Abstract A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instancelevel embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NPcomplete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the first systematic and effective approach to finding information preserving XML mappings.