Results 11 - 20
of
40
Semantically Motivated Improvements for PPM Variants
- The Computer Journal
, 1997
"... This paper explains how to significantly improve the compression performance of any PPM variant ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
This paper explains how to significantly improve the compression performance of any PPM variant
Memory-Universal Prediction of Stationary Random Processes
- IEEE Trans. Inform. Theory
, 1998
"... We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimato ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a data-driven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memory-universal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated mean-squared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
An Universal Predictor Based on Pattern Matching
- IEEE Trans. Inform. Theory
, 2000
"... We consider here an universal predictor based on pattern matching. For a given string x 1 ; x 2 ; : : : ; xn , the predictor will guess the next symbol xn+1 in such a way that the prediction error tends to zero as n ! 1 provided the string x n 1 = x 1 ; x 2 ; : : : ; xn is generated by a mixing s ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We consider here an universal predictor based on pattern matching. For a given string x 1 ; x 2 ; : : : ; xn , the predictor will guess the next symbol xn+1 in such a way that the prediction error tends to zero as n ! 1 provided the string x n 1 = x 1 ; x 2 ; : : : ; xn is generated by a mixing source. We shall prove that the rate of convergence of the prediction error is O(n \Gamma" ) for any " ? 0. In this preliminary version, we only prove our results for memoryless sources and a sketch for mixing sources. However, we indicate that our algorithm can predict equally successfully the next k symbols as long as k = O(1). 1 Introduction Prediction is important in communication, control, forecasting, investment and other areas. We understand how to do optimal prediction when the data model is known, but one needs to design universal prediction algorithm that will perform well no matter what the underlying probabilistic model is. More precisely, let X 1 ; X 2 ; : : : be an infinite ...
On-Line Stochastic Processes in Data Compression
, 1996
"... The ability to predict the future based upon the past in finite-alphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \ ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
The ability to predict the future based upon the past in finite-alphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \Delta \Delta \Delta a n , can be encoded in a number of bits that is essentially equal to the minimal information-lossless codelength, P i \Gamma log 2 p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ). The goal of universal on-line modeling, and therefore of universal data compression, is to deduce the model of the input sequence a 1 a 2 \Delta \Delta \Delta a n that can estimate each p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ) knowing only a 1 a 2 \Delta \Delta \Delta a i\Gamma1 so that the ex...
Hierarchical Universal Coding
- IEEE Trans. Inform. Theory
, 1998
"... In an earlier paper, we proved a strong version of the redundancy-capacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result hold ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In an earlier paper, we proved a strong version of the redundancy-capacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result holds for general classes of sources, it extends Rissanen's strong converse theorem for parametric families. While our earlier result has established strong optimality only for mixture codes weighted by the capacityachieving prior, our first result herein extends this finding to a general prior. For some cases our technique also leads to a simplified proof of the above mentioned strong converse theorem. The major interest in this paper, however, is in extending the theory of universal coding to hierarchical structures of classes, where each class may have a different capacity. In this setting, one wishes to incur redundancy essentially as small as that corresponding to the active class, and not ...
New Techniques for Context Modeling
, 1995
"... We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies.
Markov Types and Minimax Redundancy for Markov Sources
- IEEE Trans. Information Theory
, 2003
"... Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individu ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individual sequences. The latter is also known as the maximal or the worst case minimax redundancy. We study the maximal minimax redundancy of universal block codes for Markovian sources of any order. We prove that the maximal minimax redundancy for Markov sources of order r is asymptotically equal to 1) log 2 n + log 2 A (ln ln m 1/(m-1) )/ ln m + o(1), where n is the length of a source sequence, m is the size of the alphabet and A m is an explicit constant (e.g., we find that for a binary alphabet m = 2 and Markov of order r = 1 the constant 14.655449504 where G is the Catalan number). Unlike previous attempts, we view the redundancy problem as an asymptotic evaluation of certain sums over a set of matrices representing Markov types. The enumeration of Markov types is accomplished by reducing it to counting Eulerian paths in a multigraph. In particular, we propose an asymptotic formula for the number of strings of a given Markov type. All of these findings are obtained by analytic and combinatorial tools of analysis of algorithms. Index terms: Minimax redundancy, Markov sources, Markov types, Eulerian paths, multidimensional generating functions, analytic information theory. # A preliminary version of this paper was presented at Colloquium on Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Versailles, 2002.
Precise Average Redundancy of an Idealized Arithmetic Coding
- Coding, Proc. Data Compression Conference, 222-231, Snowbird
, 2002
"... Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the Shannon-Fano code. We sh ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the Shannon-Fano code. We shall ignore here important practical implementation issues such as finite precisions and finite buffer sizes. In fact, our idealized arithmetic code can be viewed as an adaptive infinite precision implementation of arithmetic encoder that resembles Elias coding. However, we provide very precise results for the average redundancy that takes into account integer-length constraints. These findings are obtained by analytic methods of analysis of algorithms such as theory of distribution of sequences modulo 1 and Fourier series. These estimates can be used to study the average redundancy of codes for tree sources, and ultimately the context-tree weighting algorithms.
Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm
, 2000
"... We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent non-equivalent ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent non-equivalent risks, such as nal prediction error or expected Kullback-Leibler information. We consider the asymptotic behavior of dierent risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a data-driven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC. Key words and phrases. Bootstrap, zero-one loss, nal prediction error, nite-memory source, FSMX model, Kullback-Leibler information, L 2 loss, optimal tree pruning, resampling, tree model. Short title: Selecting variable length Mar...
Computing the Entropy of User Navigation in the Web
- International Journal of Information Technology and Decision Making
, 1999
"... Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realise the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We also provide an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one. 1

