Results 1 - 10
of
25
A Normal Form for XML Documents
"... This paper takes a rst step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among p ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
This paper takes a rst step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to nd a way of converting an arbitrary DTD into a well-designed one, that avoids these problems. We rst introduce the concept of a functional dependency for XML, and de ne its semantics via a relational representation of XML. We then de ne an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties and show that it generalizes BCNF and a normal form for nested relations when those are appropriately coded as XML documents. Finally, we present a lossless algorithm for converting any DTD into one in XNF.
Strong Functional Dependencies and Their Application to Normal Forms in XML
- ACM TRANSACTIONS ON DATABASE SYSTEMS
, 2004
"... In this article, we address the problem of how to extend the definition of functional dependencies (FDs) in incomplete relations to XML documents (called XFDs) using the well-known strong satisfaction approach.We propose a syntactic definition of strong XFD satisfaction in an XML document and then j ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
In this article, we address the problem of how to extend the definition of functional dependencies (FDs) in incomplete relations to XML documents (called XFDs) using the well-known strong satisfaction approach.We propose a syntactic definition of strong XFD satisfaction in an XML document and then justify it by showing that, similar to the case in relational databases, for the case of simple paths, keys in XML are a special case of XFDs. We also propose a normal form for XML documents based on our definition of XFDs and provide a formal justification for it by proving that it is a necessary and sufficient condition for the elimination of redundancy in an XML document.
Information-theoretic tools for mining database structure from large data sets
- In ACM SIGMOD
, 2004
"... Data design has been characterized as a process of arriving at a design that maximizes the information content of each piece of data (or equivalently, one that minimizes redundancy). Information content (or redundancy) is measured with respect to a prescribed model for the data, a model that is ofte ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Data design has been characterized as a process of arriving at a design that maximizes the information content of each piece of data (or equivalently, one that minimizes redundancy). Information content (or redundancy) is measured with respect to a prescribed model for the data, a model that is often expressed as a set of constraints. In this work, we consider the problem of doing data redesign in an environment where the prescribed model is unknown or incomplete. Specifically, we consider the problem of finding structural clues in an instance of data, an instance which may contain errors, missing values, and duplicate records. We propose a set of information-theoretic tools for finding structural summaries that are useful in characterizing the information content of the data, and ultimately useful in data design. We provide algorithms for creating these summaries over large, categorical data sets. We study the use of these summaries in one specific physical design task, that of ranking functional dependencies based on their data redundancy. We show how our ranking can be used by a physical data-design tool to find good vertical decompositions of a relation (decompositions that improve the information content of the design). We present an evaluation of the approach on real data sets. 1.
RRXS: Redundancy reducing XML storage in relations
- In Proceedings of the International Conference on Very Large Data Bases (VLDB
, 2003
"... Current techniques for storing XML using relational technology consider the structure of an XML document but ignore its semantics as expressed by keys or functional dependencies. However, when the semantics of a document are considered redundancy may be reduced, node identifiers removed where v ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Current techniques for storing XML using relational technology consider the structure of an XML document but ignore its semantics as expressed by keys or functional dependencies. However, when the semantics of a document are considered redundancy may be reduced, node identifiers removed where value-based keys are available, and semantic constraints validated using relational primary key technology.
On redundancy vs dependency preservation in normalization: An information-theoretic study of 3NF
- In PODS
, 2006
"... A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0, 1]): the close ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0, 1]): the closer to 1, the less redundancy it carries. In practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all redundancies but always guarantees dependency preservation. In this paper we use the information-theoretic approach to prove that 3NF is the best normal form if one needs to achieve dependency preservation. For each dependency-preserving normal form, we define the price of dependency preservation as an information-theoretic measure of redundancy that gets introduced to compensate for dependency preservation. This is a number in the [0, 1] range: the smaller it is, the less redundancy a normal form guarantees. We prove that for every dependency-preserving normal form, the price of dependency preservation is at least 1/2, and it is precisely 1/2 for 3NF. Hence, 3NF has the least amount of redundancy among all dependency-preserving normal forms. We also show that, information-theoretically, unnormalized schemas have at least twice the amount of redundancy than schemas
XQuery containment in presence of variable binding dependencies
- In: Proc. Int. Conf. World Wide
, 2005
"... Semantic caching is an important technology for improving the response time of future user queries specified over remote servers. This paper deals with the fundamental query containment problem in an XQuery-based semantic caching system. To our best knowledge, the impact of subtle differences in XQu ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Semantic caching is an important technology for improving the response time of future user queries specified over remote servers. This paper deals with the fundamental query containment problem in an XQuery-based semantic caching system. To our best knowledge, the impact of subtle differences in XQuery semantics caused by different ways of specifying variables on query containment has not yet been studied. We introduce the concept of variable binding dependencies for representing the hierarchical element dependencies preserved by an XQuery. We analyze the problem of XQuery containment in the presence of such dependencies. We propose a containment mapping technique for nested XQuery in presence of variable binding dependencies. The implication of the nested block structure on XQuery containment is also considered. We mention the performance gains achieved by a semantic caching system we build based on the proposed technique.
XML-based Reference Modelling: Foundations of an EPC Markup Language
- Referenzmodellierung - Proceedings of the 8th GI-Workshop on Reference Modelling, MKWI
, 2004
"... The advent of XML... This paper describes the proposal of an EPC Markup Language from its guiding design principles to its concrete definition. We gather findings from other XML standardisation initiatives and derive general EPML design principles, as well as theoretical and practical XML design gui ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The advent of XML... This paper describes the proposal of an EPC Markup Language from its guiding design principles to its concrete definition. We gather findings from other XML standardisation initiatives and derive general EPML design principles, as well as theoretical and practical XML design guidelines. A survey on graph representation in XML languages founds the decision to model EPC processes as edge element lists. Subsequently, the syntactical elements of EPML describing EPC hierarchies, EPC control flow, graphical display of objects, and business perspectives on EPCs are discussed.
On Schema Discovery
- IEEE Data Engineering Bulletin
, 2003
"... Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not possible over schema-less data. Most systems assume that the schema is predefined and is an accurate reflection of the data. This assumption is often not valid in networked databases that may contain data originating from many sources and may not be valid within legacy databases where the semantics of data have evolved over time. As a result, querying and tasks that depend on structured queries (including data integration and schema mapping) may not be effective. In this paper, we consider the problem of discovering schemas from data. We focus on discovering properties of data that can be exploited in querying and transforming data. Finally, we very briefly consider the suitability of mining approaches to the task of schema discovery.
XML: From practice to theory
- In SBBD
, 2003
"... The development of XML technology has occurred very rapidly, initially leaving theory behind. As is often the case in such situations, practical development sometimes seemed more ad-hoc than well principled. There is now a substantial body of work providing formal foundations for XML. This paper pre ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The development of XML technology has occurred very rapidly, initially leaving theory behind. As is often the case in such situations, practical development sometimes seemed more ad-hoc than well principled. There is now a substantial body of work providing formal foundations for XML. This paper presents a personal perspective on some recent developments in the theory of XML. 1.
XML Design for Relational Storage
- WWW 2007
, 2007
"... Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of a native XML storage, while in practice mos ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of a native XML storage, while in practice most of XML data is stored in relational databases. In this paper we study XML design and normalization for relational storage of XML documents. To be able to relate and compare XML and relational designs, we use an information-theoretic framework that measures information content in relations and documents, with higher values corresponding to lower levels of redundancy. We show that most common relational storage schemes preserve the notion of being well-designed (i.e., anomalies- and redundancyfree). Thus, existing XML normal forms guarantee welldesigned relational storages as well. We further show that if this perfect option is not achievable, then a slight restriction on XML constraints guarantees a “second-best ” relational design, according to possible values of the informationtheoretic measure. We finally consider an edge-based relational representation of XML documents, and show that while it has similar information-theoretic properties with other relational representations, it can behave significantly worse in terms of enforcing integrity constraints.

