Results 1  10
of
75
The minimum description length principle in coding and modeling
 IEEE TRANS. INFORM. THEORY
, 1998
"... We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized ..."
Abstract

Cited by 382 (17 self)
 Add to MetaCart
(Show Context)
We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 183 (16 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
 ANNALS OF STATISTICS
, 2004
"... ..."
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 95 (14 self)
 Add to MetaCart
(Show Context)
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 56 (21 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Precise Minimax Redundancy and Regret
 IEEE TRANS. INFORMATION THEORY
, 2004
"... Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario ..."
Abstract

Cited by 46 (15 self)
 Add to MetaCart
Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario one nds the best code for the worst source either in the worst case (called also maximal minimax) or on average. We rst study the worst case minimax redundancy over a class of stationary ergodic sources and replace Shtarkov's bound by an exact formula. Among others, we prove that a generalized Shannon code minimizes the worst case redundancy, derive asymptotically its redundancy, and establish some general properties. This allows us to obtain precise redundancy rates for memoryless, Markov and renewal sources. For example, we derive the exact constant of the redundancy rate for memoryless and Markov sources by showing that an integer nature of coding contributes log(log m=(m 1))= log m+ o(1) where m is the size of the alphabet. Then we deal with the average minimax redundancy and regret. Our approach
Inequalities between Entropy and Index of Coincidence derived from Information Diagrams
 IEEE Trans. Inform. Theory
, 2001
"... To any discrete probability distribution P we can associate its entropy H(P) = − � pi ln pi and its index of coincidence IC(P) = � p 2 i. The main result of the paper is the determination of the precise range of the map P � (IC(P), H(P)). The range looks much like that of the map P � (Pmax, H(P ..."
Abstract

Cited by 26 (11 self)
 Add to MetaCart
(Show Context)
To any discrete probability distribution P we can associate its entropy H(P) = − � pi ln pi and its index of coincidence IC(P) = � p 2 i. The main result of the paper is the determination of the precise range of the map P � (IC(P), H(P)). The range looks much like that of the map P � (Pmax, H(P)) where Pmax is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − Pmax rather than Pmax, can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as those indicated are called Information Diagrams. The main result gives rise to precise lower as well as upper bounds for the entropy function. Some of these bounds are essential for the exact solution of certain problems of universal coding and prediction for Bernoulli sources. Other applications concern Shannon theory (relations betweeen various measures of divergence), statistical decision theory and rate distortion theory. Two methods are developed. One is topological, another involves convex analysis and is based on a “lemma of replacement ” which is of independent interest in relation to problems of optimization of mixed type (concave/convex optimization).
Markov Types and Minimax Redundancy for Markov Sources
 IEEE Trans. Information Theory
, 2003
"... Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individu ..."
Abstract

Cited by 22 (11 self)
 Add to MetaCart
Redundancy of universal codes for a class of sources determines by how much the actual code length exceeds the optimal code length. In the minimax scenario one designs the best code for the worst source within the class. Such minimax redundancy comes in two flavors: either on average or for individual sequences. The latter is also known as the maximal or the worst case minimax redundancy. We study the maximal minimax redundancy of universal block codes for Markovian sources of any order. We prove that the maximal minimax redundancy for Markov sources of order r is asymptotically equal to 1) log 2 n + log 2 A (ln ln m 1/(m1) )/ ln m + o(1), where n is the length of a source sequence, m is the size of the alphabet and A m is an explicit constant (e.g., we find that for a binary alphabet m = 2 and Markov of order r = 1 the constant 14.655449504 where G is the Catalan number). Unlike previous attempts, we view the redundancy problem as an asymptotic evaluation of certain sums over a set of matrices representing Markov types. The enumeration of Markov types is accomplished by reducing it to counting Eulerian paths in a multigraph. In particular, we propose an asymptotic formula for the number of strings of a given Markov type. All of these findings are obtained by analytic and combinatorial tools of analysis of algorithms. Index terms: Minimax redundancy, Markov sources, Markov types, Eulerian paths, multidimensional generating functions, analytic information theory. # A preliminary version of this paper was presented at Colloquium on Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Versailles, 2002.
The LastStep Minimax Algorithm
 Pages 279 290 of: Proc. 11th International Conference on Algorithmic Learning Theory
, 2000
"... We consider online density estimation with a parameterized density from an exponential family. In each trial t the learner predicts a parameter t . Then it receives an instance x t chosen by the adversary and incurs loss ln p(x t j t ) which is the negative loglikelihood of x t w.r.t. the predict ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
We consider online density estimation with a parameterized density from an exponential family. In each trial t the learner predicts a parameter t . Then it receives an instance x t chosen by the adversary and incurs loss ln p(x t j t ) which is the negative loglikelihood of x t w.r.t. the predicted density of the learner. The performance of the learner is measured by the regret dened as the total loss of the learner minus the total loss of the best parameter chosen oline. We develop an algorithm called the Laststep Minimax Algorithm that predicts with the minimax optimal parameter assuming that the current trial is the last one. For onedimensional exponential families, we give an explicit form of the prediction of the Laststep Minimax Algorithm and show that its regret is O(ln T ), where T is the number of trials. In particular, for Bernoulli density estimation the Laststep Minimax Algorithm is slightly better than the standard Laplace estimator. This work was done while...