Results 1 - 10
of
116
Archiving scientific data
- In ACM SIGMOD
, 2002
"... Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each elemen ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the
Towards an Internet-Scale XML Dissemination Service
, 2004
"... Publish/subscribe systems have demonstrated the ability to scale to large numbers of users and high data rates when providing content-based data dissemination services on the Internet. However, their services are limited by the data semantics and query expressiveness that they support. On the o ..."
Abstract
-
Cited by 87 (3 self)
- Add to MetaCart
Publish/subscribe systems have demonstrated the ability to scale to large numbers of users and high data rates when providing content-based data dissemination services on the Internet. However, their services are limited by the data semantics and query expressiveness that they support. On the other hand, the recent work on selective dissemination of XML data has made significant progress in moving from XML filtering to the richer functionality of transformation for result customization, but in general has ignored the challenges of deploying such XML-based services on an Internet-scale. In this paper, we address these challenges in the context of incorporating the rich functionality of XML data dissemination in a highly scalable system. We present the architectural design of ONYX, a system based on an overlay network. We identify the salient technical challenges in supporting XML filtering and transformation in this environment and propose techniques for solving them.
XGRIND: A Query-friendly XML Compressor
- IN ICDE
, 2002
"... XML documents are extremely verbose since the "schema" is repeated for every "record" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In t ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
XML documents are extremely verbose since the "schema" is repeated for every "record" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In this paper, we propose a new compression tool called XGrind, that directly supports queries in the compressed domain. A special feature of XGrind is that the compressed document retains the structure of the original document, permitting reuse of the standard XML techniques for processing the compressed document. Performance evaluation over a variety of XML documents and user queries indicates that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.
System Support for Pervasive Applications
- ACM Transactions on Computer Systems
, 2002
"... and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made.
Change-Centric Management of Versions in an XML Warehouse
- In Proceedings of VLDB 2001
, 2001
"... We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a n ..."
Abstract
-
Cited by 60 (6 self)
- Add to MetaCart
We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas and persistent identifiers. We present the foundations of the logical representation and some aspects of the physical storage policy. The work presented here was developed in the context of the Xyleme project of massive XML warehouse for XML data from the Web. It has been implemented and tested. We briefly discuss the implementation.
Structuring labeled trees for optimal succinctness, and beyond
- In FOCS
, 2005
"... Consider an ordered, static tree T on t nodes where each node has a label from alphabet set Σ. TreeTmaybeofar bitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a semina ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
Consider an ordered, static tree T on t nodes where each node has a label from alphabet set Σ. TreeTmaybeofar bitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings
XPRESS: a queriable compression for XML data
- In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
, 2003
"... Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not suppor ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Thus, the query performance on compressed XML data is degraded. In this paper, we propose XPRESS, an XML compressor which supports direct and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method, called reverse arithmetic encoding, which is intended for encoding label paths of XML data, and applies diverse encoding methods depending on the types of data values. Experimental results with real-life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On the average, the query performance of XPRESS is 2.83 times better than that of an existing XML compressor and the compression ratio of XPRESS is 73%. 1.
XML-enabled Workflow Management for E-Services across Heterogeneous Platforms
- VLDB Journal
, 2001
"... Advanced e-services require efficient, flexible, and easy-to-use workflow technology that integrates well with mainstream Internet technologies like XML and Web servers. This paper discusses an XML-enabled architecture for distributed workflow management that is implemented in the latest version of ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
Advanced e-services require efficient, flexible, and easy-to-use workflow technology that integrates well with mainstream Internet technologies like XML and Web servers. This paper discusses an XML-enabled architecture for distributed workflow management that is implemented in the latest version of our Mentor-lite prototype system. The key asset of this architecture is an XML mediator that handles the exchange of business and flow control data between workflow and business-object servers on one side and client activities on the other side via XML messages over http. Our implementation of the mediator has made use of Oracle's XSQL servlet. The major benefit of the advocated architecture is that it provides seamless integration of client applications into e-service workflows with scalable efficiency and very little explicit coding, in contrast to an earlier, Java-based, version of our Mentor-lite prototype that required much more code and exhibited potential performance problems. 1 Intr...
Path Queries on Compressed XML
- In VLDB
, 2003
"... Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree structure can be e#ectively compressed and manipulated using techniques derived from symbolic model checking . Specifically, w ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree structure can be e#ectively compressed and manipulated using techniques derived from symbolic model checking . Specifically, we show first that succinct representations of document tree structures based on sharing subtrees are highly e#ective. Second, we show that compressed structures can be queried directly and e#ciently through a process of manipulating selections of nodes and partial decompression.
Deep Store: An archival storage system architecture
- In Proceedings of the 21st International Conference on Data Engineering (ICDE ’05
, 2005
"... We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time. Archived data is stored across a cluster of nodes and recorded to hard disk. The design differentiates itself from traditional file syst ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time. Archived data is stored across a cluster of nodes and recorded to hard disk. The design differentiates itself from traditional file systems by eliminating redundancy within and across files, distributing content for scalability, associating rich metadata with content, and using variable levels of replication based on the importance or degree of dependency of each piece of stored data. We evaluate the foundations of our design, including PRESIDIO, a virtual content-addressable storage framework with multiple methods for inter-file and intra-file compression that effectively addresses the data-dependent variability of data compression. We measure content and metadata storage efficiency, demonstrate the need for a variabledegree replication model, and provide preliminary results for storage performance. 1.

