Results 1 - 10
of
40
Storing and querying ordered xml using a relational database system
- In SIGMOD
, 2002
"... XML is quickly becoming the de facto standard for data exchange over the Intemet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by ..."
Abstract
-
Cited by 180 (1 self)
- Add to MetaCart
XML is quickly becoming the de facto standard for data exchange over the Intemet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred " XML documents into relations, and translate XML queries into SQL queries over these relations. However, a key issue with such an approach, which has largely been ignored in the research literature, is how (and whether) the ordered XML data model can be efficiently supported by the unordered relational data model. This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system. This is accomplished by encoding order as a data value. We propose three order encoding methods that can be used to represent XML order in the relational data model, and also propose algorithms for translating ordered XPath expressions into SQL using these encoding methods. Finally, we report the results of an experimental study that investigates the performance of the proposed order encoding methods on a workload of ordered XML queries and updates. 1.
TIMBER: A Native XML Database
- The VLDB Journal
, 2002
"... This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in ..."
Abstract
-
Cited by 113 (10 self)
- Add to MetaCart
This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.
From XML schema to relations: A cost-based approach to XML storage
- In ICDE
, 2002
"... bohannon,juliana,prasan,simeon¡ XML has become an important medium for data representation, particularly when that data is exchanged over or browsed on the Internet. As the volume of XML data increases, there is a growing interest in storing XML in relational databases so that the well-developed fea ..."
Abstract
-
Cited by 113 (14 self)
- Add to MetaCart
bohannon,juliana,prasan,simeon¡ XML has become an important medium for data representation, particularly when that data is exchanged over or browsed on the Internet. As the volume of XML data increases, there is a growing interest in storing XML in relational databases so that the well-developed features of these systems (e.g., concurrency control, crash recovery, query processors) can be re-used. However, given the wide variety of XML applications and the mismatch between XML’s nested-tree structure and the flat tuples of the relational model, storing XML documents in relational databases presents interesting challenges. LegoDB is a cost-based XML-to-relational mapping engine that addresses this problem. It explores a space of possible mappings and selects the best mapping for a given application (defined by an XML Schema, XML data statistics, and an XML query workload). LegoDB leverages existing XML and relational technologies: it represents the target application using XML standards and constructs the space of configurations using XML-specific operations, and it uses a traditional relational optimizer to obtain accurate cost estimates of the derived configurations. In this paper, we describe the LegoDB mapping engine and provide experimental results that demonstrate the effectiveness of this approach. 1
Efficient Relational Storage and Retrieval of XML Documents
- In ACM SIGMOD Workshop on the Web and Databases (WebDB
, 2000
"... In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, exible and semantically homogeneous units ..."
Abstract
-
Cited by 85 (5 self)
- Add to MetaCart
In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, exible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
Cache-and-Query for Wide Area Sensor Databases
, 2003
"... Webcams, microphones, pressure gauges and other sensors provide exciting new opportunities for querying and monitoring the physical world. In this paper we focus on querying wide area sensor databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We present the ..."
Abstract
-
Cited by 72 (18 self)
- Add to MetaCart
Webcams, microphones, pressure gauges and other sensors provide exciting new opportunities for querying and monitoring the physical world. In this paper we focus on querying wide area sensor databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We present the first scalable system for executing XPATH queries on such databases. The system maintains the logical view of the data as a single XML document, while physically the data is fragmented across any number of host nodes. For scalability, sensor data is stored close to the sensors, but can be cached elsewhere as dictated by the queries (auto-tuning). Our design enables self-starting distributed queries that jump directly to the lowest common ancestor of the query result, dramatically reducing query response times. We present a novel query-evaluategather technique (using XSLT) for detecting (1) which data in a local database fragment is part of the query result, and (2) how to gather the missing parts. We define partitioning and cache invariants that ensure that even partial matches on cached data are exploited and that correct answers are returned, despite our dynamic query-driven caching. Experimental results demonstrate that our techniques dramatically increase query throughputs and decrease query response times in wide area sensor databases.
XPRESS: a queriable compression for XML data
- In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
, 2003
"... Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not suppor ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Thus, the query performance on compressed XML data is degraded. In this paper, we propose XPRESS, an XML compressor which supports direct and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method, called reverse arithmetic encoding, which is intended for encoding label paths of XML data, and applies diverse encoding methods depending on the types of data values. Experimental results with real-life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On the average, the query performance of XPRESS is 2.83 times better than that of an existing XML compressor and the compression ratio of XPRESS is 73%. 1.
An efficient and scalable algorithm for clustering xml documents by structure
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—With the standardization of XML as an information exchange language over the net, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Abstract—With the standardization of XML as an information exchange language over the net, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, query processing becomes expensive since, in many cases, an excessive number of joins is required to recover information from the fragmented data. If a collection consists of documents with different structures (for example, they come from different DTDs), mining clusters in the documents could alleviate the fragmentation problem. We propose a hierarchical algorithm (S-GRACE) for clustering XML documents based on structural information in the data. The notion of structure graph (s-graph) is proposed, supporting a computationally efficient distance metric defined between documents and sets of documents. This simple metric yields our new clustering algorithm which is efficient and effective, compared to other approaches based on tree-edit distance. Experiments on real data show that our algorithm can discover clusters not easily identified by manual inspection. Index Terms—Data mining, clustering, XML, semistructured data, query processing. 1
An analysis of XML database solutions for the management of MPEG-7 media descriptions
- ACM Computing Surveys
, 2003
"... MPEG-7 constitutes a promising standard for the description of multimedia content. It can be expected that a lot of applications based on MPEG-7 media descriptions will be set up in the near future. Therefore, means for the adequate management of large amounts of MPEG-7-compliant media descriptions ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
MPEG-7 constitutes a promising standard for the description of multimedia content. It can be expected that a lot of applications based on MPEG-7 media descriptions will be set up in the near future. Therefore, means for the adequate management of large amounts of MPEG-7-compliant media descriptions are certainly desirable. Essentially,
An XML Indexing Structure with Relative Region Coordinate
, 2001
"... For most index structures for XML data proposed so far, update is a problem because XML element's coordinates are expressed by absolute values. Due to the structural relationship among elements in XML documents, we have to re-compute these absolute values if the content of source data is updated. Th ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
For most index structures for XML data proposed so far, update is a problem because XML element's coordinates are expressed by absolute values. Due to the structural relationship among elements in XML documents, we have to re-compute these absolute values if the content of source data is updated. The reconstruction requires update of large portion of index files, which causes a serious problem especially when XML data content is frequently updated. In this paper, we propose an indexing structure scheme based on the Relative Region Coordinate that can effectively deal with the update problem. The main idea is that we express the coordinate of an XML element based on the region of its parent element. We present an algorithm to construct a treestructured index in which related coordinates are stored together. In consequence, our indexing scheme requires update of only a small portion of index file in case of updating. 1. Introduction XML (Extensive Markup Language) was adopted by World W...
NeT and CoT: Translating Relational Schemas to XML Schemas using Semantic Constraints
- In: ACM International Conference on Information and Knowledge Management
, 2002
"... Two algorithms, called NeT and CoT, to translate relational schemas to XML schemas using various semantic constraints are presented. The XML schema representation we use is a language-independent formalism named XSchema, that is both precise and concise. A given XSchema can be mapped to a schema in ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Two algorithms, called NeT and CoT, to translate relational schemas to XML schemas using various semantic constraints are presented. The XML schema representation we use is a language-independent formalism named XSchema, that is both precise and concise. A given XSchema can be mapped to a schema in any of the existing XML schema language proposals. Our proposed algorithms have the following characteristics: (1) NeT derives a nested structure from a flat relational model by repeatedly applying the nest operator on each table so that the resulting XML schema becomes hierarchical, and (2) CoT considers not only the structure of relational schemas, but also semantic constraints such as inclusion dependencies during the translation. It takes as input a relational schema where multiple tables are interconnected through inclusion dependencies and converts it into a good XSchema. To validate our proposals, we present experimental results using both real schemas from the UCI repository and synthetic schemas from TPC-H.

