Results 11  20
of
55
1 Probability Estimation in the RareEvents Regime
"... We address the problem of estimating the probability of an observed string that is drawn i.i.d. from an unknown distribution. Motivated by models of natural language, we consider the regime in which the length of the observed string and the size of the underlying alphabet are comparably large. In th ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We address the problem of estimating the probability of an observed string that is drawn i.i.d. from an unknown distribution. Motivated by models of natural language, we consider the regime in which the length of the observed string and the size of the underlying alphabet are comparably large. In this regime, the maximum likelihood distribution tends to overestimate the probability of the observed letters, so the GoodTuring probability estimator is typically used instead. We show that when used to estimate the sequence probability, the GoodTuring estimator is not consistent in this regime. We then introduce a novel sequence probability estimator that is consistent. This estimator also yields consistent estimators for other quantities of interest and a consistent universal classifier. I.
Lossy Compression Of Individual Signals Based On String Matching And One Pass Codebook Design
 In Proceedings ICASSP
"... This paper describes an effort to extend the LempelZiv algorithm to a practical universal lossy compression algorithm. It is based on the idea of approximate string matching with a ratedistortion (R \Gamma D) criterion, and is addressed within the framework of vector quantization (VQ) [4]. A practi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper describes an effort to extend the LempelZiv algorithm to a practical universal lossy compression algorithm. It is based on the idea of approximate string matching with a ratedistortion (R \Gamma D) criterion, and is addressed within the framework of vector quantization (VQ) [4]. A practical one pass algorithm for VQ codebook construction and adaptation for individual signals is developed which assumes no prior knowledge of the source statistics and involves no iteration. We call this technique ratedistortion LempelZiv (RDLZ). As in the case of the LempelZiv algorithm, the encoded bit stream consists of codebook (dictionary) updates as well as indices (pointers) to the codebook. The idea of "trading" bits for distortion in modifying the codebook will be introduced. Experimental results show that, for Gaussian sources as well as real images, RDLZ performs comparably, sometimes favorably, to static codebook VQ trained on the corresponding sources or images. 1. INTRODUCTION...
Universal Source Coding for Monotonic and Fast Decaying Monotonic Distributions
, 2007
"... We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = O(n), the total average sequence redundancy is O(n1/3+ε) bits overall. We then show that there exists a subclass of monotonic distributions over infinite alphabets for which redundancy of O(n1/3+ε) bits overall is still achievable. This class contains fast decaying distributions, including many distributions over the integers and geometric distributions. For some slower decays, including other distributions over the integers, redundancy of o(n) bits overall is achievable, where a method to compute specific redundancy rates for such distributions is derived. The results are specifically true for finite entropy monotonic distributions. Finally, we study individual sequence redundancy behavior assuming a sequence is governed by a monotonic distribution. We show that for sequences whose empirical distributions are monotonic, individual redundancy bounds similar to those in the average case can be obtained. However, even if the monotonicity in the empirical distribution is violated, diminishing per symbol individual sequence redundancies with respect to the monotonic maximum likelihood description length may still be achievable.
Universal compression of Markov and related sources over arbitrary alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distri ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distributions with memory is considered. Close lower and upper bounds are established on the pattern redundancy of strings generated by Hidden Markov Models with a small number of states, showing in particular that their persymbol pattern redundancy diminishes with increasing string length. The upper bounds are obtained by analyzing the growth rate of the number of multidimensional integer partitions, and the lower bounds, using Hayman’s Theorem.
Minimax Pointwise Redundancy for Memoryless Models over Large Alphabets
"... We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the alphabet s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the alphabet size relative to the sequence length. Second, we consider the pointwise minimax redundancy for a family of models in which some symbol probabilities are fixed. The latter problem leads to a binomial sum for functions with superpolynomial growth. Our findings can be used to approximate numerically the minimax pointwise redundancy for various ranges of the sequence length and the alphabet size. These results are obtained by analytic techniques such as treelike generating functions and the saddle point method.
About Adaptive Coding on Countable Alphabets
, 2012
"... This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and nondecreasing hazard rate. We prove that the autocensuring (AC) code introduced by Bontemps (2011) is adaptive with respect to the coll ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and nondecreasing hazard rate. We prove that the autocensuring (AC) code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy by Haussler and Opper (1997) and on a careful analysis of the performance of the ACcoding algorithm. The latter relies on nonasymptotic bounds for maxima of samples from discrete distributions with finite and nondecreasing hazard rate.
Universal Compression of Ergodic Quantum Sources
, 2003
"... 1) For a real r> 0, let F(r) be the family of all stationary ergodic quantum sources with von Neumann entropy rate less than r. We prove that, for any r> 0, there exists a blind, sourceindependent block compression scheme which compresses every source from F(r) to rn qubits per input block of ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
1) For a real r> 0, let F(r) be the family of all stationary ergodic quantum sources with von Neumann entropy rate less than r. We prove that, for any r> 0, there exists a blind, sourceindependent block compression scheme which compresses every source from F(r) to rn qubits per input block of size n with arbitrary high fidelity for all large enough n. 2) We show that the stationarity and the ergodicity of a quantum source {ρm} ∞ m=1 are preserved by any tracepreserving completely positive linear map of the tensor product form E⊗m, where a copy of E acts locally on each spin lattice site. We also establish ergodicity criteria for so called classicallycorrelated quantum sources.
1 What risks lead to ruin
"... Abstract — Insurance transfers losses associated with risks to the insurer for a price, the premium. Considering a natural probabilistic framework for the insurance problem, we derive a necessary and sufficient condition on loss models such that the insurer remains solvent despite the losses taken o ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Insurance transfers losses associated with risks to the insurer for a price, the premium. Considering a natural probabilistic framework for the insurance problem, we derive a necessary and sufficient condition on loss models such that the insurer remains solvent despite the losses taken on. In particular, there need not be any upper bound on the loss—rather it is the structure of the model space that decides insurability. Insurance is a way of managing losses associated with risks—for example, floods, network outages, and earthquakes—primarily by transfering risk to another entity—the insurer, for a price, the premium. The insurer attempts to break even by balancing the possible loss that may be suffered by a few (risk) with the guaranteed payments of many (premium).
On Universal Coding of Unordered Data
"... Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these sourcedestination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, w ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these sourcedestination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, we study universal multiset communication. For classes of countablealphabet sources that meet Kieffer’s condition for sequence communication, we present a scheme that universally achieves a rate of n + o(n) bits per multiset letter for multiset communication. We also define redundancy measures that are normalized by the logarithm of the multiset size rather than per multiset letter and show that these redundancy measures cannot be driven to zero for the class of finitealphabet memoryless multisets. This further implies that finitealphabet memoryless multisets cannot be encoded universally with vanishing fractional redundancy. I.