Results 1 -
9 of
9
A Normal Form for XML Documents
"... This paper takes a rst step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among p ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
This paper takes a rst step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to nd a way of converting an arbitrary DTD into a well-designed one, that avoids these problems. We rst introduce the concept of a functional dependency for XML, and de ne its semantics via a relational representation of XML. We then de ne an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties and show that it generalizes BCNF and a normal form for nested relations when those are appropriately coded as XML documents. Finally, we present a lossless algorithm for converting any DTD into one in XNF.
The Query Set Specification Language (QSSL
- In WebDB
, 2003
"... Applications require access to multiple information sources and the data of other applications. WSDL-based web services are becoming a popular way of making information sources available on the web and, hence, to applications that need to consume them – often via data integration systems that combin ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Applications require access to multiple information sources and the data of other applications. WSDL-based web services are becoming a popular way of making information sources available on the web and, hence, to applications that need to consume them – often via data integration systems that combine the data of multiple sources. We argue that the function signature paradigm that is used today by web services cannot capture the query capabilities provided by structurally rich and functionally powerful information sources, such as relational databases. We propose the Query Set Specification Language (QSSL) that allows the concise description of sets of parameterized XPath queries. A QSS is embedded in a WSDL specification to form a specialized type of web services, called Data Services. Data Services connect the calls that the source accepts with the underlying schema. QSSL will be enhanced to describe subsets of XQuery expressions beyond XPath ones. 1
Attribute Grammars for Unranked Trees as a query language for structured documents
"... Document specification languages, like for instance XML, model documents using extended context-free grammars. These di#er from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Document specification languages, like for instance XML, model documents using extended context-free grammars. These di#er from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document.
Complexity of decision problems for XML schemas and chain regular expressions
- Siam J. Comp
"... Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. Though of a very simple from, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended context-free grammars, and to single-type and restrained competition tree grammars. These grammars form abstractions of Document Type Definitions (DTDs), XML Schema definitions (XSDs) and the class of one-pass preorder typeable XML schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant. 1. Introduction. Although
Simplifying XML Schema: Single-Type Approximations of Regular Tree Languages
, 2010
"... XML Schema Definitions (XSDs) can be adequately abstracted by the single-type regular tree languages. It is wellknown, that these form a strict subclass of the robust class of regular unranked tree languages. Sadly, in this respect, XSDs are not closed under the basic operations of union and set dif ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
XML Schema Definitions (XSDs) can be adequately abstracted by the single-type regular tree languages. It is wellknown, that these form a strict subclass of the robust class of regular unranked tree languages. Sadly, in this respect, XSDs are not closed under the basic operations of union and set difference, complicating important tasks in schema integration and evolution. The purpose of this paper is to investigate how the union and difference of two XSDs can be approximated within the framework of single-type regular tree languages. We consider both optimal lower and upper approximations. We also address the more general question of how to approximate an arbitrary regular tree language by an XSD and consider the complexity of associated decision problems.
Generating, sampling and counting subclasses of regular tree languages
"... To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the fo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of We acknowledge the financial support of the Future and
On the Midpoint of a Set of XML Documents
- Proc. DEXA 2005
, 2005
"... Abstract. The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes difficult to understand the core of such heterogeneous documents (in general, schema is not available). In this paper, we offer a characterization and algorithm to obtain the midpoint (in terms of a resemblance function) of a set of semi-structured, heterogeneous documents without optional elements. The trivial case of midpoint would be the common elements to all documents. Nevertheless, in cases with several heterogeneous documents this may result in an empty set. Thus, we consider that those elements present in a given amount of documents belong to the midpoint. A exact schema could always be found generating optional elements. However, the exact schema of the whole set may result in overspecialization (lots of optional elements), which would make it useless. 1
Structural Equivalence of Regularly Extended E0L Grammars: An Automata Theoretic Proof
, 2003
"... Regularly extended E0L grammars allow an infinite number of rules for a given nonterminal provided that the set of right sides of the rules for each nonterminal is a regular language. We show that structural equivalence remains decidably for regularly extended E0L grammars. ..."
Abstract
- Add to MetaCart
Regularly extended E0L grammars allow an infinite number of rules for a given nonterminal provided that the set of right sides of the rules for each nonterminal is a regular language. We show that structural equivalence remains decidably for regularly extended E0L grammars.

