Results 1  10
of
21
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
Relative Loss Bounds for Online Density Estimation with the Exponential Family of Distributions
 MACHINE LEARNING
, 2000
"... We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract

Cited by 116 (11 self)
 Add to MetaCart
We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An oline algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the online algorithm over the total loss of the best oline parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
Predicting a Binary Sequence Almost as Well as the Optimal Biased Coin
, 1996
"... We apply the exponential weight algorithm, introduced and Littlestone and Warmuth [17] and by Vovk [24] to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes alg ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
We apply the exponential weight algorithm, introduced and Littlestone and Warmuth [17] and by Vovk [24] to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes algorithm with Jeffrey's prior, that was studied by Xie and Barron under probabilistic assumptions [26]. We derive a uniform bound on the regret which holds for any sequence. We also show that if the empirical distribution of the sequence is bounded away from 0 and from 1, then, as the length of the sequence increases to infinity, the difference between this bound and a corresponding bound on the average case regret of the same algorithm (which is asymptotically optimal in that case) is only 1=2. We show that this gap of 1=2 is necessary by calculating the regret of the minmax optimal algorithm for this problem and showing that the asymptotic upper bound is tight. We also study the application...
Precise Minimax Redundancy and Regret
 IEEE TRANS. INFORMATION THEORY
, 2004
"... Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario one nds the best code for the worst source either in the worst case (called also maximal minimax) or on average. We rst study the worst case minimax redundancy over a class of stationary ergodic sources and replace Shtarkov's bound by an exact formula. Among others, we prove that a generalized Shannon code minimizes the worst case redundancy, derive asymptotically its redundancy, and establish some general properties. This allows us to obtain precise redundancy rates for memoryless, Markov and renewal sources. For example, we derive the exact constant of the redundancy rate for memoryless and Markov sources by showing that an integer nature of coding contributes log(log m=(m 1))= log m+ o(1) where m is the size of the alphabet. Then we deal with the average minimax redundancy and regret. Our approach
Markov Types and Minimax Redundancy for Markov Sources
 IEEE Trans. Information Theory
, 2003
"... Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individu ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individual sequences. The latter is also known as the maximal or the worst case minimax redundancy. We study the maximal minimax redundancy of universal block codes for Markovian sources of any order. We prove that the maximal minimax redundancy for Markov sources of order r is asymptotically equal to 1) log 2 n + log 2 A (ln ln m 1/(m1) )/ ln m + o(1), where n is the length of a source sequence, m is the size of the alphabet and A m is an explicit constant (e.g., we find that for a binary alphabet m = 2 and Markov of order r = 1 the constant 14.655449504 where G is the Catalan number). Unlike previous attempts, we view the redundancy problem as an asymptotic evaluation of certain sums over a set of matrices representing Markov types. The enumeration of Markov types is accomplished by reducing it to counting Eulerian paths in a multigraph. In particular, we propose an asymptotic formula for the number of strings of a given Markov type. All of these findings are obtained by analytic and combinatorial tools of analysis of algorithms. Index terms: Minimax redundancy, Markov sources, Markov types, Eulerian paths, multidimensional generating functions, analytic information theory. # A preliminary version of this paper was presented at Colloquium on Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Versailles, 2002.
The Minimax Distortion Redundancy in Noisy Source Coding
, 2003
"... Consider the problem of finiterate filtering of a discrete memoryless process i#1 based on its noisy observation sequence i#1 , which is the output of a Discrete Memoryless Channel (DMC) whose input is i#1 . When the distribution of the pairs (X i , Z i ), PX,Z , is known, and for ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Consider the problem of finiterate filtering of a discrete memoryless process i#1 based on its noisy observation sequence i#1 , which is the output of a Discrete Memoryless Channel (DMC) whose input is i#1 . When the distribution of the pairs (X i , Z i ), PX,Z , is known, and for a given distortion measure, the solution to this problem is well known to be given by classical ratedistortion theory upon the introduction of a modified distortion measure. In this work we address the case where PX,Z , rather than being completely specified, is only known to belong to some set #. For a fixed encoding rate R we look at the worst case, over all # #, of the di#erence between the expected distortion of a given scheme which is not allowed to depend on the active source # # and the value of the distortionrate function at R corresponding to the noisy source #. We study the minimum attainable value achievable by any scheme operating at rate R for this worstcase quantity, denoted by D(#, R). Linking between this problem and that of source coding under several distortion measures, we prove a coding theorem for the latter problem and apply it to characterize D(#, R) for the case where all members of # share the same noisy marginal. For the case of a general #, we obtain a singleletter characterization of D(#, R) for the finitealphabet case. This gives, in particular, a necessary and su#cient condition on the set # for the existence of a coding scheme which is universally optimal for all members of # and characterizes the approximationestimation tradeo# for statistical modelling of noisy source coding problems. Finally, we obtain D(#, R) in closed form for cases where # consists of distributions on the (channel) inputoutput pair of a Bernoul...
Analytic Variations on Redundancy Rates of Renewal Processes
 IEEE Trans. Information Theory
, 2002
"... Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precis ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precise estimate of the redundancy rate for such (nonstationary) renewal sources, namely, 2 n +O(log n): This asymptotic expansion is derived by complexanalytic methods that include generating function representations, Mellin transforms, singularity analysis and saddle point estimates. This work places itself within the framework of analytic information theory.
The Precise Minimax Redundancy
 IN PROCEEDINGS OF IEEE SYMPOSIUM ON INFORMATION THEORY
, 2002
"... We start with a quick introduction of the redundancy problem. A code C n : A ! f0; 1g is de ned as a mapping from the set A of all sequences x 1 = (x 1 ; : : : ; x n ) of length n over the nite alphabet A to the set f0; 1g of all binary sequences. Given a probabilistic source model, we le ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We start with a quick introduction of the redundancy problem. A code C n : A ! f0; 1g is de ned as a mapping from the set A of all sequences x 1 = (x 1 ; : : : ; x n ) of length n over the nite alphabet A to the set f0; 1g of all binary sequences. Given a probabilistic source model, we let 1 ) be the probability of the message x 1 ; given a code C n , we let L(C n ; x 1 ) be the code length for x 1 . From Shannon's works we know that the entropy H n (P ) = 1 ) lg P (x 1 ) is the absolute lower bound on the expected code length, where lg := log 2 denotes the binary logarithm. Hence lg P (x 1 ) can be viewed as the \ideal" code length. The next natural question is to ask by how much the length L(C n ; x 1 ) of a code diers from the ideal code length, either for individual sequences or on average. The pointwise redundancy R n (C n ; P ; x 1 ) = L(C n ; x while the average redundancy R n (C n ; P ) and the maximal redundancy R n (C n ;
Robustly Minimax Codes for Universal Data Compression', The 21 'st
 Symposium on Information Theory and Its Applications
, 1998
"... Abstract — We introduce a notion of ‘relative redundancy’ for universal data compression and propose a universal code which asymptotically achieves the minimax value of the relative redundancy. The relative redundancy is a hybrid of redundancy and coding regret (pointwise redundancy), where a class ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract — We introduce a notion of ‘relative redundancy’ for universal data compression and propose a universal code which asymptotically achieves the minimax value of the relative redundancy. The relative redundancy is a hybrid of redundancy and coding regret (pointwise redundancy), where a class of information sources and a class of codes are assumed. The minimax code for relative redundancy is an extension of the modified Jeffreys mixture, which was introduced by Takeuchi and Barron and is minimax for regret.