Results 11  20
of
43
MemoryUniversal Prediction of Stationary Random Processes
 IEEE Trans. Inform. Theory
, 1998
"... We consider the problem of onestepahead prediction of a realvalued, stationary, strongly mixing random process fX i g i=01 . The best meansquare predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimato ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We consider the problem of onestepahead prediction of a realvalued, stationary, strongly mixing random process fX i g i=01 . The best meansquare predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a datadriven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memoryuniversal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated meansquared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
Semantically Motivated Improvements for PPM Variants
 The Computer Journal
, 1997
"... This paper explains how to significantly improve the compression performance of any PPM variant ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
This paper explains how to significantly improve the compression performance of any PPM variant
An Universal Predictor Based on Pattern Matching
 IEEE Trans. Inform. Theory
, 2000
"... We consider here an universal predictor based on pattern matching. For a given string x 1 ; x 2 ; : : : ; xn , the predictor will guess the next symbol xn+1 in such a way that the prediction error tends to zero as n ! 1 provided the string x n 1 = x 1 ; x 2 ; : : : ; xn is generated by a mixing s ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We consider here an universal predictor based on pattern matching. For a given string x 1 ; x 2 ; : : : ; xn , the predictor will guess the next symbol xn+1 in such a way that the prediction error tends to zero as n ! 1 provided the string x n 1 = x 1 ; x 2 ; : : : ; xn is generated by a mixing source. We shall prove that the rate of convergence of the prediction error is O(n \Gamma" ) for any " ? 0. In this preliminary version, we only prove our results for memoryless sources and a sketch for mixing sources. However, we indicate that our algorithm can predict equally successfully the next k symbols as long as k = O(1). 1 Introduction Prediction is important in communication, control, forecasting, investment and other areas. We understand how to do optimal prediction when the data model is known, but one needs to design universal prediction algorithm that will perform well no matter what the underlying probabilistic model is. More precisely, let X 1 ; X 2 ; : : : be an infinite ...
Markov Types and Minimax Redundancy for Markov Sources
 IEEE Trans. Information Theory
, 2003
"... Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individu ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individual sequences. The latter is also known as the maximal or the worst case minimax redundancy. We study the maximal minimax redundancy of universal block codes for Markovian sources of any order. We prove that the maximal minimax redundancy for Markov sources of order r is asymptotically equal to 1) log 2 n + log 2 A (ln ln m 1/(m1) )/ ln m + o(1), where n is the length of a source sequence, m is the size of the alphabet and A m is an explicit constant (e.g., we find that for a binary alphabet m = 2 and Markov of order r = 1 the constant 14.655449504 where G is the Catalan number). Unlike previous attempts, we view the redundancy problem as an asymptotic evaluation of certain sums over a set of matrices representing Markov types. The enumeration of Markov types is accomplished by reducing it to counting Eulerian paths in a multigraph. In particular, we propose an asymptotic formula for the number of strings of a given Markov type. All of these findings are obtained by analytic and combinatorial tools of analysis of algorithms. Index terms: Minimax redundancy, Markov sources, Markov types, Eulerian paths, multidimensional generating functions, analytic information theory. # A preliminary version of this paper was presented at Colloquium on Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Versailles, 2002.
Hierarchical Universal Coding
 IEEE Trans. Inform. Theory
, 1998
"... In an earlier paper, we proved a strong version of the redundancycapacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result hold ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
In an earlier paper, we proved a strong version of the redundancycapacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result holds for general classes of sources, it extends Rissanen's strong converse theorem for parametric families. While our earlier result has established strong optimality only for mixture codes weighted by the capacityachieving prior, our first result herein extends this finding to a general prior. For some cases our technique also leads to a simplified proof of the above mentioned strong converse theorem. The major interest in this paper, however, is in extending the theory of universal coding to hierarchical structures of classes, where each class may have a different capacity. In this setting, one wishes to incur redundancy essentially as small as that corresponding to the active class, and not ...
OnLine Stochastic Processes in Data Compression
, 1996
"... The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \ ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \Delta \Delta \Delta a n , can be encoded in a number of bits that is essentially equal to the minimal informationlossless codelength, P i \Gamma log 2 p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ). The goal of universal online modeling, and therefore of universal data compression, is to deduce the model of the input sequence a 1 a 2 \Delta \Delta \Delta a n that can estimate each p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ) knowing only a 1 a 2 \Delta \Delta \Delta a i\Gamma1 so that the ex...
New Techniques for Context Modeling
, 1995
"... We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies.
Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm
, 2000
"... We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent risks, such as nal prediction error or expected KullbackLeibler information. We consider the asymptotic behavior of dierent risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a datadriven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC. Key words and phrases. Bootstrap, zeroone loss, nal prediction error, nitememory source, FSMX model, KullbackLeibler information, L 2 loss, optimal tree pruning, resampling, tree model. Short title: Selecting variable length Mar...
Computing the Entropy of User Navigation in the Web
 International Journal of Information Technology and Decision Making
, 1999
"... Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realise the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We also provide an extension of our technique to higherorder Markov chains by a suitable reduction of a higherorder Markov chain model to a firstorder one. 1
Precise Average Redundancy of an Idealized Arithmetic Coding
 Coding, Proc. Data Compression Conference, 222231, Snowbird
, 2002
"... Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the ShannonFano code. We sh ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the ShannonFano code. We shall ignore here important practical implementation issues such as finite precisions and finite buffer sizes. In fact, our idealized arithmetic code can be viewed as an adaptive infinite precision implementation of arithmetic encoder that resembles Elias coding. However, we provide very precise results for the average redundancy that takes into account integerlength constraints. These findings are obtained by analytic methods of analysis of algorithms such as theory of distribution of sequences modulo 1 and Fourier series. These estimates can be used to study the average redundancy of codes for tree sources, and ultimately the contexttree weighting algorithms.