Results 1 
6 of
6
Parameterised Compression for Sparse Bitmaps
 Proc. ACMSIGIR International Conference on Research and Development in Information Retrieval
, 1992
"... : Fulltext retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
: Fulltext retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Data compaction and compression; H.3.2 [Information Storage]: File organisation . Keywords: Fulltext retrieval, data compression, document database, Huffman coding, geometric distribution, inverted file. 1 Introduction Fulltext retrieval systems are used for storing and accessing document collections such as newspaper a...
Is Huffman Coding Dead?
 Computing
, 1993
"... : In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
: In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance is often negligible. Referring also to other criteria, we conclude that for many applications, Huffman codes should still remain a competitive choice. 1. Introduction It is paradoxical that, as the technology for storing and transmitting information has gotten cheaper and more effective, interest in data compression has increased. There are many explanations, but most conspicuous is that improvements in media have expanded our sense of what we wish to store. For example, CDRom technology allows us to store whole libraries instead of records describing individual items; but the requirements of storing full text easily exceeds the capabilities even of the optical format. Similarly, there is ...
Robust Universal Complete Codes for Transmission and Compression
 Discrete Applied Mathematics
, 1996
"... Several measures are defined and investigated, which allow the comparison of codes as to their robustness against errors. Then new universal and complete sequences of variablelength codewords are proposed, based on representing the integers in a binary Fibonacci numeration system. Each sequence is ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Several measures are defined and investigated, which allow the comparison of codes as to their robustness against errors. Then new universal and complete sequences of variablelength codewords are proposed, based on representing the integers in a binary Fibonacci numeration system. Each sequence is constant and need not be generated for every probability distribution. These codes can be used as alternatives to Huffman codes when the optimal compression of the latter is not required, and simplicity, faster processing and robustness are preferred. The codes are compared on several "reallife" examples. 1. Motivation and Introduction Let A = fA 1 ; A 2 ; \Delta \Delta \Delta ; An g be a finite set of elements, called cleartext elements, to be encoded by a static uniquely decipherable (UD) code. For notational ease, we use the term `code' as abbreviation for `set of codewords'; the corresponding encoding and decoding algorithms are always either given or clear from the context. A code i...
Models of Bitmap Generation: A Systematic Approach to Bitmap Compression
 Inf. Proc. & Management, v28
, 1992
"... : In large IR systems, information about word occurrence may be stored in form of a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
: In large IR systems, information about word occurrence may be stored in form of a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible correlations between rows and between columns. The methods are based on partitioning the matrix into small blocks and predicting the 1bit distribution within a block by means of various bit generation models. Each block is then encoded using Huffman or arithmetic coding. The methods also use a new way of enumerating subsets of fixed size from a given superset. Preliminary experimental results indicate improvements over previous methods. 1. Introduction The common approach to processing complex boolean queries in large fulltext document retrieval systems is to use inverted files: a concordance is accessed via a dictionary, and includes for each different word of the text, the ordered list ...
Optimal Algorithms for Inserting a Random Element Into a Random Heap
, 1996
"... Two algorithms for inserting a random element into a random heap are shown to be optimal (in the sense that they use the least number of comparisons on the average among all comparisonbased algorithms) for different values of n under a uniform model. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Two algorithms for inserting a random element into a random heap are shown to be optimal (in the sense that they use the least number of comparisons on the average among all comparisonbased algorithms) for different values of n under a uniform model.
Variable to Fixed Entropy Coders: Why and How? (And their application to H.263)
, 2000
"... Entropy coders fall into several general categories: Huffman and Huffmanlike coders that parse the input into fixed length pieces and encode each with a variable length output, arithmetic coders that take an arbitrarily long string as an input and encode with a single output string, and Tunstallli ..."
Abstract
 Add to MetaCart
Entropy coders fall into several general categories: Huffman and Huffmanlike coders that parse the input into fixed length pieces and encode each with a variable length output, arithmetic coders that take an arbitrarily long string as an input and encode with a single output string, and Tunstalllike coders that parse the input into variable length strings and encode each with a fixed length output. This paper is about a Tunstalllike coder called BAC (for Block Arithmetic Coding). We argue that this class of coders is very appropriate for many situations, especially when the probabiliities vary, when channel errors may occur, or when fast operation is needed. In particular, we discuss how an H.263 encoder/decoder can be modified to replace the syntax arithmetic code with a block arithmetic code to get greater speed and better error resiliency.