Results 1 -
2 of
2
Tradeoffs in XML Database Compression
- In DCC
, 2006
"... Large XML data files, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data e#ciently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Large XML data files, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data e#ciently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or more statistical text compressors based on PPM or arithmetic coding in which some of the context is provided by the XML document structure. The purpose of this paper is to investigate the relationship between the extant proposals in more detail. We review the two main statistical modeling approaches proposed so far, and evaluate their performance on two representative XML databases.
An Analysis of XML Compression Efficiency
"... XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. We use this ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. We use this corpus and linear regression to assess 14 general-purpose and XML-specific compressors relative to the proposed metric. We also identify key factors when selecting a compressor. Our results show XMill or WBXML may be useful in some instances, but a general-purpose compressor is often the best choice. Categories and Subject Descriptors E.4 [Data]: Coding and Information Theory—Data Compaction and Compression; H.3.4 [Systems and Software]: performance

