Results 1 - 10
of
21
Structured Materialized Views for XML Queries
, 2007
"... The performance of XML database queries can be greatly enhanced by rewriting them using materialized views. We study the problem of rewriting a query using materialized views, where both the query and the views are described by a tree pattern language, appropriately extended to capture a large XQuer ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
The performance of XML database queries can be greatly enhanced by rewriting them using materialized views. We study the problem of rewriting a query using materialized views, where both the query and the views are described by a tree pattern language, appropriately extended to capture a large XQuery subset. The pattern language features optional nodes and nesting, allowing to capture the data needs of nested XQueries. The language also allows describing storage features such as structural identifiers, which enlarge the space of rewritings. We study pattern containment and equivalent rewriting under the constraints expressed in a structural summary, whose enhanced form also entails integrity constraints. Our approach is implemented in the ULoad [7] prototype and we present a performance analysis.
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
, 2008
"... Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regu ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as k-occurrence regular expressions (k-OREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.
The ICS-FORTH SWIM: A Powerful Semantic Web Integration Middleware
- In Proc. SWDB’03
, 2003
"... Semantic Web (SW) technology aims to facilitate the integration of legacy data sources spread worldwide. Despite the plethora of SW languages (e.g., RDF/S, DAML+OIL, OWL) recently proposed for supporting large scale information interoperation, the vast majorityof legacy sources still rely on rel ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Semantic Web (SW) technology aims to facilitate the integration of legacy data sources spread worldwide. Despite the plethora of SW languages (e.g., RDF/S, DAML+OIL, OWL) recently proposed for supporting large scale information interoperation, the vast majorityof legacy sources still rely on relational databases (RDB) published on the Web or corporate intranets as virtual XML. In this paper, weadvocate a Datalog framework for mediating high-level queries to relational and/or XML sources using communityontologies expressed in a SW language such as RDF/S. We describe the architecture and the reasoning services of our SW integration middleware, called SWIM, and we present the main design choices and techniques for supporting powerful mappings between different data models, as well as, reformulation and optimization of queries expressed against mediation schemas and views.
Path Summaries and Path Partitioning in Modern XML Databases
- WORLD WIDE WEB (2008 ) 11:117–151
, 2008
"... XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document’s content in a persistent store, under the form of path indices or path tables. We revisit the notions of ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document’s content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary’s many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar
MemBeR: A Micro-benchmark Repository for XQuery
- In XSym
, 2005
"... Abstract. XQuery is a feature-rich language with complex semantics. This makes it hard to come up with a benchmark suite which covers all performance-critical features of the language, and at the same time allows one to individually validate XQuery evaluation techniques. This paper presents MemBeR, ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. XQuery is a feature-rich language with complex semantics. This makes it hard to come up with a benchmark suite which covers all performance-critical features of the language, and at the same time allows one to individually validate XQuery evaluation techniques. This paper presents MemBeR, a micro-benchmark repository, allowing the evaluation of an XQuery implementation with respect to precise evaluation techniques. We take the view that a fixed set of queries is probably insufficient to allow testing for various performance aspects, thus, the users of the repository must be able to add new data sets and/or queries for specific performance assessment tasks. We present our methodology for constructing the micro-benchmark repository, and illustrate with some sample micro-benchmarks. 1
Supporting Efficient Streaming and Insertion of XML Data in RDBMS
- PROC. 3RD INT. WORKSHOP DATA INTEGRATION OVER THE WEB (DIWEB), 2004
, 2004
"... Abstract. Relational database systems are increasingly used to manage XML documents, especially for data-centric XML. In this paper we present a new approach to efficiently manage document-centric XML data based on a generic relational mapping. Such a generic XML storage is especially useful in data ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract. Relational database systems are increasingly used to manage XML documents, especially for data-centric XML. In this paper we present a new approach to efficiently manage document-centric XML data based on a generic relational mapping. Such a generic XML storage is especially useful in data integration systems to manage highly diverse XML documents. We focus on efficient insert operations, support of streamed data and fast retrieval of document fragments. Therefore we introduce a new numbering scheme called DLN (Dynamic Level Numbering) and several variants of it. A performance evaluation based on a prototypical implementation demonstrates the high efficiency of DLN. 1
An Efficient Infrastructure for Native Transactional XML Processing
- XML PROCESSING, APPEARS IN DATA & KNOWLEDGE ENGINEERING, ELSEVIER
, 2007
"... Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the proce ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the processing properties of natively stored XML documents as compared to relational tables. Therefore, we have to develop new and appropriate techniques with ACID transaction guarantees tailored to the processing characteristics of tree documents and the operations on them. For this reason, we want to provide for an efficient infrastructure of XDBMSs consisting of tree node addressing and indexing together with fine-grained locking of tree nodes. In this respect, our prime and novel contribution is to reveal the potential of our prefix-based node labeling called DeweyIDs supporting record addressing, indexing, and locking protocols. In this paper, we first sketch our version of prefix-based node labeling and summarize a quantitative study on them. An overview of our layered XDBMS architecture indicates the concepts and functionalities to be reused from relational DBMS implementations. The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration. Selected empirical experiments evaluate the XTC system performance and support our system assessment.
An Extended Preorder Index for Optimising XPath Expressions
- Proceedings of 3rd XSym, LNCS
, 2005
"... Abstract. Many of the problems with native XML databases relate to query performance and subsequently, it can be difficult to convince traditional database users of the benefits of using semi- or unstructured databases. Presently, there still lacks an index structure providing efficient support for ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. Many of the problems with native XML databases relate to query performance and subsequently, it can be difficult to convince traditional database users of the benefits of using semi- or unstructured databases. Presently, there still lacks an index structure providing efficient support for structural queries and the traditional data-centric and content queries. This paper presents an extended index structure based on the preorder traversal rank and the level (or depth) rank of each node in a document tree. The extended index fully supports the navigation of all XPath axes while efficiently supporting data-centric queries. The ability to start path traversals from arbitrary nodes in a document tree also enables the extended index to support the evaluation of path traversals embedded in XQuery expressions. Furthermore, an encoding technique is presented where properties of the level ranking may be exploited to provide efficient and optimised level-based XPath evaluations. 1
The ICS-FORTH Semantic Web Integration Middleware (SWIM)
- SWIM), IEEE Data Engineering Bulletine
, 2003
"... This paper presents the fundamental ideas for devising a comprehensive framework that allows user communities to 1. specify XML RDF and RDB RDF mappings; 2. verify that these mappings conform to the semantics of the employed SW ontologies; 3. compose RQL queries with these mappings and prod ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents the fundamental ideas for devising a comprehensive framework that allows user communities to 1. specify XML RDF and RDB RDF mappings; 2. verify that these mappings conform to the semantics of the employed SW ontologies; 3. compose RQL queries with these mappings and produce XML or RDB queries; 4. specify further levels of abstraction as RDF RDF views; 5. compose RQL queries with such views; 6. perform query optimizations
Simplifying XML Schema: Effortless Handling of Nondeterministic Regular Expressions
, 2009
"... Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Sche ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Schema are constrained by the infamous unique particle attribution (UPA) constraint. In formal language theoretic terms, this constraint restricts content models to deterministic regular expressions. As the latter constitute a semantic notion and no simple corresponding syntactical characterization is known, it is very difficult for non-expert users to understand exactly when and why content models do or do not violate UPA. In the present paper, we therefore investigate solutions to relieve users from the burden of UPA by automatically transforming nondeterministic expressions into concise deterministic ones defining the same language or constituting good approximations. The presented techniques facilitate XSD construction by reducing the design task at hand more towards the complexity of the modeling task. In addition, our algorithms can serve as a plug-in for

