Results 11  20
of
173
The Context Tree Weighting Method: Basic Properties
 IEEE Transactions on Information Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desira ..."
Abstract

Cited by 81 (1 self)
 Add to MetaCart
(Show Context)
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context tree weighting procedure is optimal in the sense that i...
A New Method of Ngram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese
 In COLING94
, 1994
"... In the process of establishing the information theory, C. E. Shannon proposed the Markov process as a good model to characterize a natural language. The core of this idea is to calculate the frequencies of strings composed of n characters (ngrams), but this statistical analysis of large text data a ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
(Show Context)
In the process of establishing the information theory, C. E. Shannon proposed the Markov process as a good model to characterize a natural language. The core of this idea is to calculate the frequencies of strings composed of n characters (ngrams), but this statistical analysis of large text data and for a large n has never been carried out because of the memory limitation of computer and the shortage of text data. Taking advantage of the recent powerful computers we developed a new algorithm of ngrams of large text data for arbitrary large n and calculated successfully, within relatively short time, ngrams of some Japanese text data containing between two and thirty million characters. From this experiment it became clear that the automatic extraction or determination of words, compound words and collocations is possible by mutually comparing ngram statistics for different values of n.
Predicting Unseen Triphone with Senones
 IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN
, 1993
"... ..."
(Show Context)
An Extensible MetaLearning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a metalearning approach to integrating the results of mul...
Fifty Years of Shannon Theory
, 1998
"... A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication. ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication.
Practical Implementations of Arithmetic Coding
 IN IMAGE AND TEXT
, 1992
"... We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmet ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmetic coder with only minimal loss of compression efficiency. Our coder is based on the replacement of arithmetic by table lookups coupled with a new deterministic probability estimation scheme.
Robust temporal coding of contrast by V1 neurons for transient but not for steadystate stimuli
 J Neurosci
, 1998
"... ..."
(Show Context)
Some equivalences between Shannon entropy and Kolmogorov complexity
 IEEE Transactions on Information Theory
, 1978
"... that the average codeword length L,:, for the best onetoone (not necessBluy uniquely decodable) code for X is shorter than the average codeword length L,, for the best mdquely decodable code by no more thau (log2 log, n) + 3. Let Y be a random variable taking OII a fiite or countable number of val ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
that the average codeword length L,:, for the best onetoone (not necessBluy uniquely decodable) code for X is shorter than the average codeword length L,, for the best mdquely decodable code by no more thau (log2 log, n) + 3. Let Y be a random variable taking OII a fiite or countable number of values and having entropy H. Then it is proved that L,:,>Hlog2 (H+l)log, log2 (H+l)...6. Some relations are eatahlished amoug the Kolmogorov, Cl&in, and extension complexities. Finally it is shown that, for all computable probability distributions, the universal prefix codes associated with the conditional Chaitin complexity have expected codeword length within a constant of the Shannon entropy. I.
Transferring Previously Learned BackPropagation Neural Networks To New Learning Tasks
, 1993
"... ..."
Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform  Part One: Without Context Models
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2000
"... A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammarbased code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammarbased code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar transform constructs sequentially a sequence of irreducible grammars from which the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithmetic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source. Moreover, it is proved that their worst case redundancies among all individual sequences of length are upperbounded by �� � �� � �� � , where is a constant. Simulation results show that the proposed algorithms outperform the Unix Compress and Gzip algorithms, which are based on LZ78 and LZ77, respectively.