Results 1 
9 of
9
Sequential prediction of individual sequences under general loss functions
 IEEE Trans. on Information Theory
, 1998
"... ..."
(Show Context)
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratiotypicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
(Show Context)
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression
 IEEE Trans. Inform. Theory
, 1999
"... We characterize the achievable pointwise redundancy rates for lossy data compression at a fixed distortion level. "Pointwise redundancy" refers to the difference between the description length achieved by an nthorder block code and the optimal nR(D) bits. For memoryless sources, we show t ..."
Abstract

Cited by 23 (14 self)
 Add to MetaCart
(Show Context)
We characterize the achievable pointwise redundancy rates for lossy data compression at a fixed distortion level. "Pointwise redundancy" refers to the difference between the description length achieved by an nthorder block code and the optimal nR(D) bits. For memoryless sources, we show that the best achievable redundancy rate is of order O( p n) in probability. This follows from a secondorder refinement to the classical source coding theorem, in the form of a "onesided central limit theorem." Moreover, we show that, along (almost) any source realization, the description lengths of any sequence of block codes operating at distortion level D exceed nR(D) by at least as much as C p n log log n, infinitely often. Corresponding direct coding theorems are also given, showing that these rates are essentially achievable. The above rates are in sharp contrast with the expected redundancy rates of order O(log n) recently reported by various authors. Our approach is based on showing that...
Secondorder noiseless source coding theorems
 IEEE TRANS. INFORM. THEORY
, 1997
"... Shannon’s celebrated source coding theorem can be viewed as a “onesided law of large numbers.” We formulate secondorder noiseless source coding theorems for the deviation of the codeword lengths from the entropy. For a class of sources that includes Markov chains we prove a “onesided central limi ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Shannon’s celebrated source coding theorem can be viewed as a “onesided law of large numbers.” We formulate secondorder noiseless source coding theorems for the deviation of the codeword lengths from the entropy. For a class of sources that includes Markov chains we prove a “onesided central limit theorem” and a law of the iterated logarithm.
Worst Case Prediction over Sequences under Log Loss
 In The Mathematics of Information Coding, Extraction, and Distribution
, 1997
"... . We consider the game of sequentially assigning probabilities to future data based on past observations under logarithmic loss. We are not making probabilistic assumptions about the generation of the data, but consider a situation where a player tries to minimize his loss relative to the loss of th ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
. We consider the game of sequentially assigning probabilities to future data based on past observations under logarithmic loss. We are not making probabilistic assumptions about the generation of the data, but consider a situation where a player tries to minimize his loss relative to the loss of the (with hindsight) best distribution from a target class for the worst sequence of data. We give bounds on the minimax regret in terms of the metric entropies of the target class with respect to suitable distances between distributions. 1. Introduction. The assignment of probabilities to the possible outcomes of future data which is based on past observations has important applications to prediction, data compression and gambling. In a scenario where the data are assumed to occur at random with an unknown probability distribution, this problem can be treated as a well known statistical estimation problem. Optimal strategies can be found within a game theoretic approach, where a statistician ...
Abstract While deciphering the Enigma Code during World
"... problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an informationtheoretic and machinelearning framework, we define the att ..."
Abstract
 Add to MetaCart
(Show Context)
problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an informationtheoretic and machinelearning framework, we define the attenuation of a probability estimator as the largest possible ratio between the persymbol probability assigned to an arbitrarilylong sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the GoodTuring estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the persymbol probability assigned by the estimator is as high as possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that for some sequences this probability agrees with our intuition, while for others it is rather unexpected. 1.
Code Design Via Selection Of A Statistical Model
"... Consider the problem of designing a code via which to encode data samples from a finite set, where the datagenerating model is unknown but belongs to a known model class. Code design techniques developed by information theorists involve the selection of a statistical model as "coding distribut ..."
Abstract
 Add to MetaCart
(Show Context)
Consider the problem of designing a code via which to encode data samples from a finite set, where the datagenerating model is unknown but belongs to a known model class. Code design techniques developed by information theorists involve the selection of a statistical model as "coding distribution" which is based in some way upon the models in the known model class. Some design techniques employ Bayesian analysis that would be familiar to any statistician. Other techniques employ insights that may not be widely known outside the information theory community. Several code design techniques are surveyed, where the emphasis is upon techniques that lead to codes that are nearly optimal in terms of redundancy performance. I Introduction Let\Omega be a finite data set. A code on\Omega is a onetoone mapping ffi with domain\Omega such that: (a.1) For each x 2 \Omega\Gamma ffi(x) is a binary string of finite length (the codeword into which x is encoded by ffi). (a.2) The set of codewords...
Editors
"... Theory will publish survey and tutorial articles in the following topics: • Coded modulation • Coding theory and practice • Communication complexity • Communication system design • Cryptology and data security • Data compression • Data networks • Demodulation and equalization • Denoising • Detection ..."
Abstract
 Add to MetaCart
(Show Context)
Theory will publish survey and tutorial articles in the following topics: • Coded modulation • Coding theory and practice • Communication complexity • Communication system design • Cryptology and data security • Data compression • Data networks • Demodulation and equalization • Denoising • Detection and estimation • Information theory and statistics • Information theory and computer science • Joint source/channel coding • Modulation and signal design • Multiuser detection