The ContextTree Weighting Method: Basic Properties
 IEEE Trans. Inform. Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Cited by 159 (12 self)
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed contexttree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound.
Universal Portfolios with Side Information
 IEEE Transactions on Information Theory
, 1996
"... We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight fr ..."
Cited by 85 (3 self)
We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight from observed market and sideinformation outcomes. This is an individual sequence result which shows that the difference between the exponential growth rates of wealth of the best stateconstant rebalanced portfolio and the universal portfolio with sideinformation is uniformly less than (d=(2n)) log(n + 1) + (k=n) log 2 for every stock market and sideinformation sequence and for all time n. Here d = k(m \Gamma 1) is the number of degrees of freedom in the stateconstant rebalanced portfolio with k states of sideinformation and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression. Keywords: Universal investment, univ...
PUBLIC: A decision tree classifier that integrates building and pruning
 Proceedings of the 24th VLDB Conference
, 1998
"... Abstract. Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees ..."
Cited by 60 (4 self)
Abstract. Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent “overfitting”. Generating the decision tree in two distinct phases could result in a substantial amount of wasted effort since an entire subtree constructed in the first phase may later be pruned in the next phase. In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second “pruning ” phase with the initial “building” phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to make this determination for a node, before it is expanded, PUBLIC computes a lower bound on the minimum cost subtree rooted at the node. This estimate is then used by PUBLIC to identify the nodes that are certain to be pruned, and for such nodes, not expend effort on splitting them. Experimental results with reallife as well as synthetic data sets demonstrate the effectiveness of PUBLIC’s integrated approach which has the ability to deliver substantial performance improvements. Keywords: data mining, classification, decision tree
InformationTheoretic Analysis of Neural Coding
 J. Comp. Neuroscience
, 1998
"... We describe an approach to analyzing single and multiunit (ensemble) discharge patterns based on informationtheoretic distance measures and on empirical theories derived from work in universal signal processing. In this approach, we quantify the difference between response patterns, be they tim ..."
Cited by 57 (13 self)
We describe an approach to analyzing single and multiunit (ensemble) discharge patterns based on informationtheoretic distance measures and on empirical theories derived from work in universal signal processing. In this approach, we quantify the difference between response patterns, be they timevarying or not, using informationtheoretic distance measures. We apply these techniques to single and multiple unit processing of sound amplitude and sound location. These examples illustrate that neurons can simultaneously represent at least two kinds of information with different levels of fidelity. The fidelity can persist through a transient and a subsequent steadystate response, indicating that it is possible for an evolving neural code to represent information with constant fidelity. 1 Johnson et al. Analysis of Neural Coding 1 Introduction Neural coding has been classified into two broadly defined types: rate codes the average rate of spike discharge and timing codes the t...
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Cited by 56 (1 self)
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
On Universal Quantization by Randomized Uniform/Lattice Quantizers
 IEEE Trans. Inform. Theory
, 1992
"... Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal codi ..."
Cited by 48 (15 self)
Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal coding scheme is examined, and we derive a general expression for it. An upper bound for the redundancy of this scheme, defined as the difference between its rate and the minimal possible rate, given by the rate distortion function of the source, is derived. This bound holds for all distortion levels. Furthermore, we present a composite upper bound on the redundancy as a function of the quantizer resolution which leads to a tighter bound in the high rate (low distortion) case. Key Words: Uniform and Lattice Quantization, Randomized Quantization, Universal Coding, RateDistortion Performance Meir Feder was also supported by The Andrew W. Mellon Foundation, Woods Hole Oceanographic Institu...
A Vector Quantization Approach to Universal Noiseless Coding and Quantization
 IEEE Trans. Inform. Theory
, 1996
"... AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quan ..."
Cited by 44 (10 self)
AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quantizers, or variablerate quantizers. We take a vector quantization approach to twostage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes ” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the firststage quantizer, using induced measures of rate and distortion, to design locally optimal twostage, codes. On a source of medical images, twostage variahlerate vector quantizers designed in this way outperform standard (onestage) fixedrate vector quantizers by over 9 dB. The tail of the operational distortionrate function of the firststage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of twostage codes. We show that there exist twostage universal noiseless codes, fixedrate quantizers, and variablerate quantizers whose perletter rate and distortion redundancies converge to zero as (k/2)n ’ logn, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen’s theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n’) when the universe of sources is countable, and as O(r~l+‘) when the universe of sources is infinitedimensional, under appropriate conditions. Index TermsTwostage, adaptive, compression, minimum description length, clustering. I.
Universal Lossless Source Coding With the Burrows Wheeler Transform
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2002
"... The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless sourcecoding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWTbased compression schemes ar ..."
Cited by 38 (3 self)
The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless sourcecoding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWTbased compression schemes are widely touted as lowcomplexity algorithms giving lossless coding rates better than those of the ZivLempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWTbased coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length , 2) a variety of very simple new techniques for BWTbased lossless source coding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWTbased codes for finitememory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWTbased lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in ZivLempel style codes and, for some BWTbased codes, within a constant factor of the optimal rate of convergence for finitememory sources.
The ContextTree Weighting Method: Extensions
 IEEE Transactions on Information Theory
, 1994
"... . First we modify the basic (binary) contexttree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavi ..."
Cited by 35 (1 self)
. First we modify the basic (binary) contexttree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavior for all tree sources, while the number of records in the context tree is not larger than 2T \Gamma 1. Here T is the length of the source sequence. For this extended contexttree weighting algorithm we show that with probability one the compression ratio is not larger than the source entropy for source sequence length T ! 1 for stationary and ergodic sources. Keywords Sequential data compression, universal source coding, tree sources, modeling procedure, cumulative redundancy bounds, binary stationary and ergodic sources. 1. Introduction The contexttree weighting method, first presented at the San Antonio ISIT [7], appears to be an efficient implementation for weighting (mixing) the cod...