Results 1  10
of
39
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
, 2008
"... Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regu ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as koccurrence regular expressions (kOREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns kOREs for increasing values of k, and selects the one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.
Optimizing schema languages for XML: Numerical constraints and interleaving
 ICDT
, 2007
"... The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and nonemptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and nonemptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at righthand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleenestar, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.
Conjunctive Query Containment over Trees
 DBPL 2007. LNCS
, 2007
"... The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problem ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNPcomplete, or Π P 2complete. For the satisfiability problem most problems are classified as either in PTIME or NPcomplete.
Reasoning about XML with Temporal Logics and Automata
 In LPAR’08
"... We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
(Show Context)
We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple singleexponential translation into unranked tree automata, in the spirit of the classical LTLtoBüchi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary nodeselecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views.
Inference of Concise Regular Expressions and DTDs
"... We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XMLdocuments, a problem that basically reduces to learning concise regular expressions from positive examples strings. We identify two classes of concise regular expressions—the single occurrence regula ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XMLdocuments, a problem that basically reduces to learning concise regular expressions from positive examples strings. We identify two classes of concise regular expressions—the single occurrence regular expressions (SOREs) and the chain regular expressions (CHAREs)—that capture the far majority of expressions used in practical DTDs. For the inference of SOREs we present several algorithms that first infer an automaton for a given set of example strings and then translate that automaton to a corresponding SORE, possibly repairing the automaton when no equivalent SORE can be found. In the process, we introduce a novel automaton to regular expression rewrite technique which is of independent interest. When only a very small amount of XML data is available, however (for instance when the data is generated by Web service requests or by answers to queries), these algorithms produce regular expressions that are too specific. Therefore, we introduce a novel learning algorithm CRX that directly infers CHAREs (which form a subclass of SOREs) without going through an automaton representation. We show that CRX performs very well within its target class on very small datasets. 11
Complexity of decision problems for XML schemas and chain regular expressions
 Siam J. Comp
"... Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. Though of a very simple from, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended contextfree grammars, and to singletype and restrained competition tree grammars. These grammars form abstractions of Document Type Definitions (DTDs), XML Schema definitions (XSDs) and the class of onepass preorder typeable XML schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant. 1. Introduction. Although
Simplifying XML Schema: Effortless Handling of Nondeterministic Regular Expressions
, 2009
"... Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Sche ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
(Show Context)
Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Schema are constrained by the infamous unique particle attribution (UPA) constraint. In formal language theoretic terms, this constraint restricts content models to deterministic regular expressions. As the latter constitute a semantic notion and no simple corresponding syntactical characterization is known, it is very difficult for nonexpert users to understand exactly when and why content models do or do not violate UPA. In the present paper, we therefore investigate solutions to relieve users from the burden of UPA by automatically transforming nondeterministic expressions into concise deterministic ones defining the same language or constituting good approximations. The presented techniques facilitate XSD construction by reducing the design task at hand more towards the complexity of the modeling task. In addition, our algorithms can serve as a plugin for
Incremental XPath Evaluation
"... We study the problem of incrementally maintaining the result of an XPath query on an XML database under updates. In its most general form, this problem asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of u ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We study the problem of incrementally maintaining the result of an XPath query on an XML database under updates. In its most general form, this problem asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of updates U to D and the problem is to compute the result of Q(U(D)), i.e., the result of evaluating query Q on the database D after having applied the updates U. In order to quickly answer this question, we are allowed to maintain an auxiliary data structure, and the complexity of the maintenance algorithms is measured in (i) the size of the auxiliary data structure, (ii) the worstcase time per update needed to compute Q(U(D)) and (iii) the worstcase time per update needed to bring the auxiliary data structure up to date. We allow three kinds of updates: node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D) · poly(Q)) per update and conjunctive forward XPath queries in time O(depth(D)·log(width(D))·poly(Q)) per update, where Q  is the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in the database D, respectively. The auxiliary data structures for maintenance are linear in D  and polynomial in Q  in all these cases.
Reasoning about XML update constraints
 PODS'07
, 2007
"... We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint implication, we also consider an instancebased approach. More precisely, we study implication with respect to a current tree instance, resulting from a series of unknown updates. The main motivation of our work is reasoning about data integrity under update restrictions in contexts where owners may lose control over their data, such as in publishing or exchange.