Results 1 
9 of
9
Offline compression by greedy textual substitution
 PROC. IEEE
, 2000
"... Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted textstring until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
Robust Universal Complete Codes for Transmission and Compression
 Discrete Applied Mathematics
, 1996
"... Several measures are defined and investigated, which allow the comparison of codes as to their robustness against errors. Then new universal and complete sequences of variablelength codewords are proposed, based on representing the integers in a binary Fibonacci numeration system. Each sequence is ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Several measures are defined and investigated, which allow the comparison of codes as to their robustness against errors. Then new universal and complete sequences of variablelength codewords are proposed, based on representing the integers in a binary Fibonacci numeration system. Each sequence is constant and need not be generated for every probability distribution. These codes can be used as alternatives to Huffman codes when the optimal compression of the latter is not required, and simplicity, faster processing and robustness are preferred. The codes are compared on several "reallife" examples. 1. Motivation and Introduction Let A = fA 1 ; A 2 ; \Delta \Delta \Delta ; An g be a finite set of elements, called cleartext elements, to be encoded by a static uniquely decipherable (UD) code. For notational ease, we use the term `code' as abbreviation for `set of codewords'; the corresponding encoding and decoding algorithms are always either given or clear from the context. A code i...
Punctured elias codes for variablelength coding of the integers
 The University of Auckland
, 1996
"... The compact representation of integers is an important problem in areas such as data compression, especially where there is a nearly monotonic decrease in the likelihood of larger integers. While many different representations have been described, it is not always clear in which circumstances a part ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The compact representation of integers is an important problem in areas such as data compression, especially where there is a nearly monotonic decrease in the likelihood of larger integers. While many different representations have been described, it is not always clear in which circumstances a particular code is to be preferred. This report introduces a variant of the Elias γ code which is shown to be better than other codes for some distributions. 1. Compact integer representations The efficient representation of symbols of differing probabilities is one of the classical problems of information theory and coding theory, with efficient solutions known since the early 1950’s (ShannonFano and Huffman codes[9]). In traditional, nonadaptive, coding we assume a priori probabilities of the input symbols and construct suitable codes to represent those symbols efficiently. There is no necessary or simple relation between a symbol and its representation. Here we are concerned with a different problem, especially as the symbol alphabet (integers of arbitrary upper bound) may be so large as to preclude the formal construction of an efficient code. Given an arbitrary integer we wish to represent it as compactly as possibly, preferably by an algorithm which recognises only the magnitude and bit pattern of the integer (no table lookup or mapping needed). Equally, a simple algorithm should be able to recover an integer from an input bit stream, even if that particular integer has never been seen before. The binary representation of the integer is often visible within the representation and other information is appended to indicate the length or precision. Many variablelength representations have been described; here we concentrate on just a few, emphasising those which have a simple relation between code and value and are instantaneous or nearly so. Following Elias[3], we first introduce two preliminary representations which are relatively unimportant per se, but are used in many other codes. • α(n) is the unary representation, n 0’s followed by a 1 (or 1’s followed by a 0) • β(n), is the natural binary representation of n, from the most significant 1.
Optimal Lossless Compression of a Class of Dynamic Sources
 Proc Data Compression Conference, edited by J.A. Storer and J.H. Reif. IEEE Computer Society Press, Los Alamitos, CA
, 1997
"... . The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is constructed by a composition of distinct sources, for example, a document ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is constructed by a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct packets sent over a communication channel. There is a vast literature of adaptive methods used to tailor the compression to dynamic sources. However, little is known about optimal or near optimal methods for lossless compression of strings generated by sources that are not stationary ergodic. Here we do not assume the source is stationary. Instead we assume that the source produces an infinite sequence of concatenated finite strings s 1 ; s 2 ; : : : where (i) each finite string s i is generated by a sampling of a (possibly distinct) stationary ergodic source S i , and (ii) the length of each of the s i is lower b...
A Relationship Between Linear Complexity and kError Linear Complexity
, 2000
"... The kerror linear complexity of a periodic sequence of period N is defined as the smallest linear complexity that can be obtained by changing k or fewer bits of the sequence per period. This paper shows a relationship between the linear complexity and the minimum value k for which the kerror lin ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The kerror linear complexity of a periodic sequence of period N is defined as the smallest linear complexity that can be obtained by changing k or fewer bits of the sequence per period. This paper shows a relationship between the linear complexity and the minimum value k for which the kerror linear complexity is strictly less than the linear complexity. Keywords: Cryptology, Stream cipher, Linear complexity. 1
SASE: Implementation of a Compressed Text Search Engine
 In Usenix Symposium on Internet Technologies and Systems
, 1997
"... Keyword based search engines are the basic building block of text retrieval systems. Higher level systems like content sensitive search engines and knowledgebased systems still rely on keyword search as the underlying text retrieval mechanism. With the explosive growth in content, Internet and Intra ..."
Abstract
 Add to MetaCart
Keyword based search engines are the basic building block of text retrieval systems. Higher level systems like content sensitive search engines and knowledgebased systems still rely on keyword search as the underlying text retrieval mechanism. With the explosive growth in content, Internet and Intranet information repositories require efficient mechanisms to store as well as index data. In this paper we discuss the implementation of the Shrink and Search Engine (SASE) framework which unites text compression and indexing to maximize keyword search performance while reducing storage cost. SASE features the novel capability of being able to directly search through compressed text without explicit decompression. The implementation includes a search server architecture, which can be accessed from a Java frontend to perform keyword search on the Internet. The performance results show that the compression efficiency of SASE is within 717% of GZIP one of the best lossless compression scheme...
Optimal Encoding of NonStationary Sources
, 2001
"... The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is formed from a composition of distinct sources, for example, a document with ..."
Abstract
 Add to MetaCart
The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is formed from a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct packets sent over a communication channel. There is a vast literature of adaptive methods used to tailor the compression to dynamic sources. However, little is known about optimal or near optimal methods for lossless compression of strings generated by sources that are not stationary ergodic. Here, we do not assume the source is stationary. Instead, we assume that the source produces an infinite sequence of concatenated finite strings Sl... s,, where: (i) Each finite string si is generated by a sampling of a (possibly distinct) stationary ergodic source &, and (ii) the length of each of the si is lower bounded by a function L(n) such that L(n)/log(n) grows unboundedly with the length n of all the text within $1 .. $i.
Searching and Encoding for Infinite Ordered Sets
, 1979
"... We consider the relationships between binary search algorithms and binary prefix encodings of infinite linearly ordered sets. It is known that each search algorithm determines a prefix code, and in three cases we show to what extent the converse is true. For sets similar to the natural numbers we sh ..."
Abstract
 Add to MetaCart
We consider the relationships between binary search algorithms and binary prefix encodings of infinite linearly ordered sets. It is known that each search algorithm determines a prefix code, and in three cases we show to what extent the converse is true. For sets similar to the natural numbers we show that searchrelated codes are as flexible as all prefix codes, while for general ordered sets they are only asymptotically as flexible. KEY WORDS: infinite sets. Unbounded search; prefix codes; search codes; linear order; 1.
Efficient Coding of Integers for Certain Probability Distributions
, 2006
"... Abstract — Methods for prefix coding integers generally either consider specific distributions that decline more quickly than a power law (for Golomblike codes) or simultaneously consider all finiteentropy distributions (for universal codes). Particular powerlaw and similar distributions, however ..."
Abstract
 Add to MetaCart
Abstract — Methods for prefix coding integers generally either consider specific distributions that decline more quickly than a power law (for Golomblike codes) or simultaneously consider all finiteentropy distributions (for universal codes). Particular powerlaw and similar distributions, however, are often known to model particular random variables. Codes for such distributions can be judged based on (estimated) compression ratio. This paper introduces a family of universal source codes with an eye towards nearoptimal coding of known distributions. Compression ratios are found for wellknown probability distributions using these codes and other prefix codes. One application of these nearoptimal codes is an improved representation of rational numbers. I.