Results 1 - 10
of
64
Indexing and Querying XML Data for Regular Path Expressions
- IN VLDB
, 2001
"... With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML ..."
Abstract
-
Cited by 265 (9 self)
- Add to MetaCart
With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) ##-Join for searching paths from an element to another, (2) ##-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) ##-Join for finding Kleene-Closure on repeated paths or elements. The ##-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an or- # This work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037) and Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents of the paper. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its...
XMill: an Efficient Compressor for XML Data
, 1999
"... We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogene ..."
Abstract
-
Cited by 165 (0 self)
- Add to MetaCart
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of general-purpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a self-describi...
ViST: A Dynamic Index Method for Querying XML Data by Tree Structures
- In SIGMOD
, 2003
"... much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that query ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is e#ective, scalable, and e#cient in supporting structural queries.
eXist: An Open Source Native XML Database
- Web-Services, and Database Systems, NODe 2002 Web and Database-Related Workshops
, 2002
"... Abstract. With the advent of native and XML enabled database systems, techniques for efficiently storing, indexing and querying large collections of XML documents have become an important research topic. This paper presents the storage, indexing and query processing architecture of eXist, an Open So ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Abstract. With the advent of native and XML enabled database systems, techniques for efficiently storing, indexing and querying large collections of XML documents have become an important research topic. This paper presents the storage, indexing and query processing architecture of eXist, an Open Source native XML database system. eXist is tightly integrated with existing tools and covers most of the native XML database features. An enhanced indexing scheme at the architecture’s core supports quick identification of structural node relationships. Based on this scheme, we extend the application of path join algorithms to implement most parts of the XPath query language specification and add support for keyword search on element and attribute contents. 1. Overview eXist
The Simplest Query Language That Could Possibly Work
- In Proceedings of the 2nd INEX Workshop
, 2003
"... The INEX'03 query language proved to be much too complicated for the INEX participants to use well, let alone anyone else. We need something simpler, but not too simple. Something which is basically a hybrid between Boolean IR queries and a stripped down CSS will do the job. ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
The INEX'03 query language proved to be much too complicated for the INEX participants to use well, let alone anyone else. We need something simpler, but not too simple. Something which is basically a hybrid between Boolean IR queries and a stripped down CSS will do the job.
Efficient Static Analysis of XML Paths and Types
, 2008
"... We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically type-check XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of ..."
Abstract
-
Cited by 44 (28 self)
- Add to MetaCart
We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically type-check XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of a formula. The logic corresponds to the alternation free modal µ-calculus without greatest fixpoint, restricted to finite trees, and where formulas are cycle-free. Our proof method is based on two auxiliary results. First, XML regular tree types and XPath expressions have a linear translation to cycle-free formulas. Second, the least and greatest fixpoints are equivalent for finite trees, hence the logic is closed under negation. Building on these results, we describe a practical, effective system for solving the satisfiability of a formula. The system has been experimented with some decision problems such as XPath emptiness, containment, overlap, and coverage, with or without type constraints. The benefit of the approach is that our system can be effectively used in static analyzers for programming languages
A Two-Layered Integration Approach for Product Information in B2B E-commerce
, 2001
"... . Electronic B2B marketplaces bring together many online suppliers and buyers, each of which can potentially use his own format to represent the products in his product catalog. The marketplaces have to perform non-trivial mappings of these catalogs. In this paper, we analyze the problems which o ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
. Electronic B2B marketplaces bring together many online suppliers and buyers, each of which can potentially use his own format to represent the products in his product catalog. The marketplaces have to perform non-trivial mappings of these catalogs. In this paper, we analyze the problems which occur during the integration, taking several leading XML-based standards as an example. We advocate a three-layer product integration framework to resolve the difficulties in overcoming these problems with a direct one-layer integration. In this paper, we focus on the first two layers: the XML-based syntax layer and the data models layer expressed in RDF. The approach operates in three main steps. First, we create an RDF data model from the XML catalog, which eliminates all syntactical peculiarities of the catalog. Second, the catalog is translated from the source model to the RDF model of the target catalog. Finally, the transformation from RDF to XML restores all syntactical regulations required by the target catalog format. The approach is suitable for inter-operation with higher-level document and workflow ontologies. 1
Constraints for Multimedia Presentation Generation
, 2002
"... Automatic multimedia presentation generation is applicable in a wide variety of circumstances because of its ability to adapt to di#erent presentation contexts such as hardware platforms, user expertise and user interest. The process ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
Automatic multimedia presentation generation is applicable in a wide variety of circumstances because of its ability to adapt to di#erent presentation contexts such as hardware platforms, user expertise and user interest. The process
View Maintenance for Hierarchical Semistructured Data
- In Data Warehousing and Knowledge Discovery
, 2000
"... Over the last few years, efficient access to heterogeneous data sources has become tremendously important. One common technique for increasing efficiency is to maintain locally stored views in data warehouses, which must be kept current with respect to the changes in the underlying data sources. Whi ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Over the last few years, efficient access to heterogeneous data sources has become tremendously important. One common technique for increasing efficiency is to maintain locally stored views in data warehouses, which must be kept current with respect to the changes in the underlying data sources. While this problem has been extensively studied in the context of select-project-join (SPJ) views and relational warehouses, many of the data sources accessible today over the Web are highly irregular. Views over this irregular data often perform complex restructuring and regrouping far beyond traditional SPJ views. This paper describes WHAX (Warehouse Architecture for XML), an architecture for defining and maintaining views over hierarchical semistructured data and relational data sources with key constraints. The WHAX model is a variant of the deterministic model of [8], but is more reminiscent of XML. The view definition language is a variation of XML-QL that has been adapted to the ...
An Analysis of Integration Problems of XML-Based Catalogs for B2B Electronic Commerce
- IN: PROCEEDINGS OF THE NINTH IFIP 2.6 WORKING CONFERENCE ON DATABASE SEMANTICS, HONG KONG
, 2001
"... Electronic B2B marketplaces bring together many online suppliers and buyers, each of which can potentially use his own format to represent the products in his product catalog. The marketplaces have to perform non-trivial mappings of these catalogs. In this paper we analyze the problems which occur d ..."
Abstract
-
Cited by 18 (13 self)
- Add to MetaCart
Electronic B2B marketplaces bring together many online suppliers and buyers, each of which can potentially use his own format to represent the products in his product catalog. The marketplaces have to perform non-trivial mappings of these catalogs. In this paper we analyze the problems which occur during integration, taking several leading XML and non-XML formats as examples. We discuss the method for applying XSLT technology to the integration problems, propose typical solutions to these problems, and give the corresponding examples of integration rules.

