Results 1  10
of
10
Data Compression Using Adaptive Coding and Partial String Matching
 IEEE Transactions on Communications
, 1984
"... The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during ..."
Abstract

Cited by 331 (20 self)
 Add to MetaCart
The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use highorder Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixedcase English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.
Retrieving Collocations from Text: Xtract
 Computational Linguistics
, 1993
"... Natural languages are full of collocations, recurrent combinations of words that cooccur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of wri ..."
Abstract

Cited by 288 (1 self)
 Add to MetaCart
Natural languages are full of collocations, recurrent combinations of words that cooccur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to retrieve various types of collocations from the analysis of large samples of textual data. These techniques automatically produce large numbers of collocations along with statistical figures intended to reflect the relevance of the associations. However, noue of these techniques provides functional information along with the collocation. Also, the results produced often contained improper word associations reflecting some spurious aspect of the training corpus that did not stand for true collocations. In this paper, we describe a set of techniques based on statistical methods for retrieving and identifying collocations from large textual corpora. These techniques produce a wide range of collocations and are based on some original filtering methods that allow the production of richer and higherprecision output. These techniques have been implemented and resulted in a lexicographic tool, Xtract. The techniques are described and some results are presented on a 10 millionword corpus of stock market news reports. A lexicographic evaluation of Xtract as a collocation retrieval tool has been made, and the estimated precision of Xtract is 80%.
Arithmetic coding
 IBM J. Res. Develop
, 1979
"... Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per itera ..."
Abstract

Cited by 195 (0 self)
 Add to MetaCart
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval
Data Compression Using Dynamic Markov Modelling
 The Computer Journal
, 1986
"... A method to dynamically construct Markov models that describe the characteristics of binary messages is developed. Such models can be used to predict future message characters and can therefore be used as a basis for data compression. To this end, the Markov modelling technique is combined with Guaz ..."
Abstract

Cited by 79 (3 self)
 Add to MetaCart
A method to dynamically construct Markov models that describe the characteristics of binary messages is developed. Such models can be used to predict future message characters and can therefore be used as a basis for data compression. To this end, the Markov modelling technique is combined with Guazzo coding to produce a powerful method of data compression. The method has the advantage of being adaptive: messages may be encoded or decoded with just a single pass through the data. Experimental results reported here indicate that the Markov modelling approach generally achieves much better data compression than that observed with competing methods on typical computer data. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: data compaction and compression; C.2.0 [ComputerCommunication Networks]: data communications General Terms: Experimentation, Algorithms Additional Key Words and Phrases: Data compression, text compression, adaptive coding, Guazzo coding January...
Practical Implementations of Arithmetic Coding
 IN IMAGE AND TEXT
, 1992
"... We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmet ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmetic coder with only minimal loss of compression efficiency. Our coder is based on the replacement of arithmetic by table lookups coupled with a new deterministic probability estimation scheme.
OnLine Stochastic Processes in Data Compression
, 1996
"... The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \ ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \Delta \Delta \Delta a n , can be encoded in a number of bits that is essentially equal to the minimal informationlossless codelength, P i \Gamma log 2 p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ). The goal of universal online modeling, and therefore of universal data compression, is to deduce the model of the input sequence a 1 a 2 \Delta \Delta \Delta a n that can estimate each p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ) knowing only a 1 a 2 \Delta \Delta \Delta a i\Gamma1 so that the ex...
Arithmetic coding for finitestate noiseless channels
 IEEE Trans. Inform. Theory
, 1994
"... AbstractWe analyze the expected delay for infinite precision arithmetic codes, and suggest a practical implementation that closely approximates the idealized infinite precision model. Index TermsArithmetic coding, expected delay analysis, finite precision arithmetic. the reader to [7] for some per ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
AbstractWe analyze the expected delay for infinite precision arithmetic codes, and suggest a practical implementation that closely approximates the idealized infinite precision model. Index TermsArithmetic coding, expected delay analysis, finite precision arithmetic. the reader to [7] for some perspective on these modifications. Other practical arithmetic codes which are similar to the codes of Elias and Pasco can be found in [8] and [91. Guazzo realized that arithmetic coding could be used to map source sequences into more general code alphabets than those with N equal cost letters. In [lo], he described a practical arithmetic code which efficiently maps sequences of source symbols into sequences of letters from a channel with memoryless letter costs; i.e., the cost of transmitting any code letter depends only on that letter and different letters may have different transmission costs. A practical arithmetic code to efficiently encode source sequences into sequences from a chaimel with finitestate letter costs was specified in [ 1 11; here, the cost of transmitting a code letter depends on the letter, the string of previously transmitted letters, and the state of the channel before transmission began. In this paper, we provide an alternate approach to arithmetic coding by concentrating on the issue of coding delay. We will generalize Elias ’ code first to memoryless cost channels and later to finitestate channels, and demonstrate that the expected value of coding delay is bounded for both types of channels. We also suggest a practical implementation that focuses on delay and is closely related to Elias ’ ideal arithmetic code. For the case of binary equal cost code letters, the expected delay analysis and an alternate implementation appeared earlier in course notes prepared by the second author. I.
A TwoStage Modelling Method for Compressing Binary Images by Arithmetic Coding
 The Computer Journal
, 1992
"... : A twostage modelling schema to be used together with arithmetic coding is proposed. Main motivation of the work has been the relatively slow operation of arithmetic coding. The new modelling schema reduces the use of arithmetic coding by applying to large white regions global modelling which cons ..."
Abstract
 Add to MetaCart
: A twostage modelling schema to be used together with arithmetic coding is proposed. Main motivation of the work has been the relatively slow operation of arithmetic coding. The new modelling schema reduces the use of arithmetic coding by applying to large white regions global modelling which consumes less time. This composite method works well and with a set of test images it took only ca. 41% of time required by QMcoder. At the same time the loss in compression ratio is only marginal. Index terms: Image compression, arithmetic coding, block coding, modelling. 1. Introduction Pictorial information is expressed by a very simple model in blackandwhite images. Only two colours, black and white, are recognised and even the greyness of different picture elements is omitted so that the image consists of a configuration of pixels each representing the pure black or white colour. In spite of the binary nature of the image files they have a high demand of the storage space. This brings ma...