Results 1 - 10
of
83
The Context-Tree Weighting Method: Basic Properties
- IEEE Trans. Inform. Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Abstract
-
Cited by 120 (10 self)
- Add to MetaCart
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context-tree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound.
Universal Portfolios with Side Information
- IEEE Transactions on Information Theory
, 1996
"... We present a sequential investment algorithm, the ¯-weighted universal portfolio with side-information, which achieves, to first order in the exponent, the same wealth as the best side-information dependent investment strategy (the best state-constant rebalanced portfolio) determined in hindsight fr ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
We present a sequential investment algorithm, the ¯-weighted universal portfolio with side-information, which achieves, to first order in the exponent, the same wealth as the best side-information dependent investment strategy (the best state-constant rebalanced portfolio) determined in hindsight from observed market and side-information outcomes. This is an individual sequence result which shows that the difference between the exponential growth rates of wealth of the best state-constant rebalanced portfolio and the universal portfolio with side-information is uniformly less than (d=(2n)) log(n + 1) + (k=n) log 2 for every stock market and side-information sequence and for all time n. Here d = k(m \Gamma 1) is the number of degrees of freedom in the state-constant rebalanced portfolio with k states of side-information and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression. Keywords: Universal investment, univ...
The Context Tree Weighting Method: Basic Properties
- IEEE Transactions on Information Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context tree weighting procedure is optimal in the sense that i...
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
, 1998
"... Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to gene ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". Generating the decision tree in two distinct phases could result in a substantial amount of wasted effort since an entire subtree constructed in the first phase may later be pruned in the next phase. In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the initial "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to ma...
Information-Theoretic Analysis of Neural Coding
- J. Comp. Neuroscience
, 1998
"... We describe an approach to analyzing single- and multi-unit (ensemble) discharge patterns based on information-theoretic distance measures and on empirical theories derived from work in universal signal processing. In this approach, we quantify the difference between response patterns, be they tim ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
We describe an approach to analyzing single- and multi-unit (ensemble) discharge patterns based on information-theoretic distance measures and on empirical theories derived from work in universal signal processing. In this approach, we quantify the difference between response patterns, be they time-varying or not, using information-theoretic distance measures. We apply these techniques to single and multiple unit processing of sound amplitude and sound location. These examples illustrate that neurons can simultaneously represent at least two kinds of information with different levels of fidelity. The fidelity can persist through a transient and a subsequent steady-state response, indicating that it is possible for an evolving neural code to represent information with constant fidelity. 1 Johnson et al. Analysis of Neural Coding 1 Introduction Neural coding has been classified into two broadly defined types: rate codes the average rate of spike discharge and timing codes the t...
On prediction using variable order Markov models
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
A Vector Quantization Approach to Universal Noiseless Coding and Quantization
- IEEE Trans. Inform. Theory
, 1996
"... Abstract-A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixed-rate quan ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
Abstract-A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes ” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally opti-mal two-stage, codes. On a source of medical images, two-stage variahle-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n- ’ logn, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen’s theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-’) when the universe of sources is countable, and as O(r~-l+‘) when the universe of sources is infinite-dimensional, under appropriate conditions. Index Terms-Two-stage, adaptive, compression, minimum de-scription length, clustering. I.
On Universal Quantization by Randomized Uniform/Lattice Quantizers
- IEEE Trans. Inform. Theory
, 1992
"... Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal codi ..."
Abstract
-
Cited by 35 (13 self)
- Add to MetaCart
Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal coding scheme is examined, and we derive a general expression for it. An upper bound for the redundancy of this scheme, defined as the difference between its rate and the minimal possible rate, given by the rate distortion function of the source, is derived. This bound holds for all distortion levels. Furthermore, we present a composite upper bound on the redundancy as a function of the quantizer resolution which leads to a tighter bound in the high rate (low distortion) case. Key Words: Uniform and Lattice Quantization, Randomized Quantization, Universal Coding, Rate-Distortion Performance Meir Feder was also supported by The Andrew W. Mellon Foundation, Woods Hole Oceanographic Institu...
Universal Lossless Source Coding With the Burrows Wheeler Transform
- IEEE Transactions on Information Theory
, 2002
"... The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes ar ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv--Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length , 2) a variety of very simple new techniques for BWT-based lossless source coding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv--Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources.
The Context-Tree Weighting Method: Extensions
- IEEE Transactions on Information Theory
, 1994
"... . First we modify the basic (binary) context-tree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavi ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
. First we modify the basic (binary) context-tree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavior for all tree sources, while the number of records in the context tree is not larger than 2T \Gamma 1. Here T is the length of the source sequence. For this extended context-tree weighting algorithm we show that with probability one the compression ratio is not larger than the source entropy for source sequence length T ! 1 for stationary and ergodic sources. Keywords Sequential data compression, universal source coding, tree sources, modeling procedure, cumulative redundancy bounds, binary stationary and ergodic sources. 1. Introduction The context-tree weighting method, first presented at the San Antonio ISIT [7], appears to be an efficient implementation for weighting (mixing) the cod...

