• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Improving Table Compression with Combinatorial Optimization (2002)

Cached

  • Download as a PDF

Download Links

  • [arxiv.org]
  • [arxiv.org]
  • [www.cs.princeton.edu]
  • [www.cs.princeton.edu:80]
  • [www.cs.rutgers.edu]
  • [www.research.att.com]
  • [www2.research.att.com]
  • [www2.research.att.com]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Adam L. Buchsbaum, et al.
Citations:23 - 4 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Buchsbaum02improvingtable,
    author = {Adam L. Buchsbaum and et al.},
    title = {Improving Table Compression with Combinatorial Optimization },
    year = {2002}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA’00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35–55 % improvement over gzip with negligible slowdown; the off-line reordering provides up to 20 % further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.

Keyphrases

table compression    combinatorial optimization    individual file    new theory    first on-line training algorithm    off-line reordering    off-line training procedure    on-line compressor    off-line training algorithm    salesman problem    various test file    negligible slowdown    disjoint interval    compression rate    massive table    max-snp hard    partition-training paradigm    column permutation    prior work    heuristic observation    on-line algorithm    previous experimental observation    table compression problem   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University