Results 1  10
of
134
The minimum description length principle in coding and modeling
 IEEE Trans. Inform. Theory
, 1998
"... Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized m ..."
Abstract

Cited by 306 (12 self)
 Add to MetaCart
Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples. Index Terms—Complexity, compression, estimation, inference, universal modeling.
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 172 (16 self)
 Add to MetaCart
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
The ContextTree Weighting Method: Basic Properties
 IEEE Trans. Inform. Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Abstract

Cited by 160 (12 self)
 Add to MetaCart
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed contexttree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound.
Universal prediction of individual sequences
 IEEE Transactions on Information Theory
, 1992
"... AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved t ..."
Abstract

Cited by 156 (13 self)
 Add to MetaCart
AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved that this FS predictability can be attained by universal sequential prediction schemes. Specifically, an efficient prediction procedure based on the incremental parsing procedure of the LempelZiv data compression algorithm is shown to achieve asymptotically the FS predictability. Finally, some relations between compressibility and predictability are pointed out, and the predictability is proposed as an additional measure of the complexity of a sequence. Index TermsPredictability, compressibility, complexity, finitestate machines, Lempel Ziv algorithm.
LeZiUpdate: An InformationTheoretic Approach to Track Mobile Users in PCS Networks
, 1999
"... The complexity of the mobility tracking problem in a cellular environment has been characterized under an informationtheoretic framework. Shannon’s entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user’s path update ..."
Abstract

Cited by 112 (12 self)
 Add to MetaCart
The complexity of the mobility tracking problem in a cellular environment has been characterized under an informationtheoretic framework. Shannon’s entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user’s path updates (as opposed to the widely used location updates), the proposed adaptive online algorithm can learn subscribers’ profiles. This technique evolves out of the concepts of lossless compression. The compressibility of the variabletofixed length encoding of the acclaimed LempelZiv family of algorithms reduces the update cost, whereas their builtin predictive power can be effectively used to reduce paging cost.
Data Compression
 ACM Computing Surveys
, 1987
"... This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effectiv ..."
Abstract

Cited by 86 (4 self)
 Add to MetaCart
This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
Variable Length Markov Chains
 Annals of Statistics
, 1999
"... We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of higher order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary higher order Markov ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of higher order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary higher order Markov chains. From a more algorithmic view, the VLMC model class has attracted interest in information theory and machine learning but statistical properties have not been explored very much. Provided that good estimation is available, an additional structural richness of the model class enhances predictive power by finding a better tradeoff between model bias and variance and allows better structural description which can be of specific interest. The latter is exemplified with some DNA data. A version of the treestructured context algorithm, proposed by Rissanen (1983) in an information theoretical setup, is shown to have new good asymptotic properties for estimation in the class of VLMC's, even when the underlying model increases in dimensionality: consistent estimation of minimal state spaces and mixing properties of fitted models are given. We also propose a new bootstrap scheme based on fitted VLMC's. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures. AMS 1991 subject classifications. Primary 62M05; secondary 60J10, 62G09, 62M10, 94A15 Key words and phrases. Bootstrap, categorical time series, central limit theorem, context algorithm, data compression, finitememory sources, FSMX model, KullbackLeibler distance, model selection, tree model. Short title: Variable Length Markov Chain 1 Research supported in part by the Swiss National Science Foundation. Part of the work has been done while visiting th...
Universal Portfolios with Side Information
 IEEE Transactions on Information Theory
, 1996
"... We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight fr ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight from observed market and sideinformation outcomes. This is an individual sequence result which shows that the difference between the exponential growth rates of wealth of the best stateconstant rebalanced portfolio and the universal portfolio with sideinformation is uniformly less than (d=(2n)) log(n + 1) + (k=n) log 2 for every stock market and sideinformation sequence and for all time n. Here d = k(m \Gamma 1) is the number of degrees of freedom in the stateconstant rebalanced portfolio with k states of sideinformation and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression. Keywords: Universal investment, univ...
The Context Tree Weighting Method: Basic Properties
 IEEE Transactions on Information Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context tree weighting procedure is optimal in the sense that i...
Lossless Compression of Continuoustone Images via Context Selection, Quantization, and Modeling
, 1996
"... Context modeling is an extensively studied paradigm for lossless compression of continuoustone images. However, without careful algorithm design, highorder Markovian modeling of continuoustone images is too expensive in both computational time and space to be practical. Furthermore, the exponenti ..."
Abstract

Cited by 70 (3 self)
 Add to MetaCart
Context modeling is an extensively studied paradigm for lossless compression of continuoustone images. However, without careful algorithm design, highorder Markovian modeling of continuoustone images is too expensive in both computational time and space to be practical. Furthermore, the exponential growth of the number of modeling states in the order of a Markov model can quickly lead to the problem of context dilution; that is, an image may not have enough samples for good estimates of conditional probabilities associated with the modeling states. In this paper new techniques for context modeling of DPCM errors are introduced that can exploit contextdependent DPCM error structures to the benefit of compression. New algorithmic techniques of forming and quantizing modeling contexts are also developed to alleviate the problem of context dilution and reduce both time and space complexities. By innovative formation, quantization, and use of modeling contexts, the proposed lossless i...