## Fifty Years of Shannon Theory (1998)

12423 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...rase not seen previously) can be encoded and decoded very easily. 17 Remarkably, the Lempel–Ziv algorithm encodes any stationary ergodic source at its entropy rate as shown by Ziv [100] and Wyner–Ziv =-=[101]-=-, [102]. The analysis of the statistical properties of the Lempel–Ziv algorithm has proven to be a fertile research ground [98], [103]–[108]. Despite its optimality and simplicity, the Lempel–Ziv algo... |

1522 | A universal algorithm for sequential data compression. - Ziv, Lempel - 1977 |

1372 |
Theory of communications
- Gabor
- 1946
(Show Context)
Citation Context ...limited linear systems. Unbeknownst to those authors, E. Whittaker [7] (1915) and J. Whittaker [8] (1929) had found how to interpolate losslessly the sampled values of bandlimited functions. D. Gabor =-=[9]-=- (1946) realized the importance of the duration–bandwidth product and proposed a time–frequency uncertainty principle. R. Hartley’s 1928 paper [10] uses terms such as “rate of communication,” “intersy... |

1371 |
A method for the construction of minimum-redundancy codes
- Huffman
- 1952
(Show Context)
Citation Context ...) the construction of a minimum average-length code, and 2) the converse variablelength source coding theorem. The variable-length source code that minimizes average length was obtained by D. Huffman =-=[48]-=-, as an outgrowth of a homework problem assigned in R. Fano’s MIT information theory class [49]. The practicality of the Huffman code has withstood the test of time with a myriad applications ranging ... |

809 | D.J.: A BlockSorting Lossless Data Compression Algorithm. Digital Systems Research Center, Research Report 124,
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...ces, lower encoding/decoding complexity can be achieved by the adaptive fixed-to-variable source codes of B. Ryabko [89], [90]. 16 Showing experimental promise, the nonprobabilistic sorting method of =-=[93]-=- preprocesses sources with memory so that universal codes for memoryless sources achieve good compression efficiency. Suppose now that we adopt a parametric description of the source uncertainty, say ... |

800 |
Arithmetic Coding for Data Compression.
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...o apply to the Shannon–Fano code mentioned in Section II-A. The second shortcoming is circumvented by the arithmetic coding method of J. Rissanen [60] (generalized in [61] and [62] and popularized in =-=[63]-=-), whose philosophy is related to that of the Shannon–Fano code. 14 The use of arithmetic coding is now widespread in the datacompression industry (and, in particular, in image and video applications ... |

712 |
Three approaches to the quantitative definition of information,
- Kolmogorov
- 1968
(Show Context)
Citation Context ... of runlength encoding [78], already anticipated by Shannon [1], [79], as well as several of the universal coding techniques discussed in the next subsection. E. Universal Source Coding A. Kolmogorov =-=[80]-=- coined the term “universal” to refer to data-compression algorithms that do not know a priori the distribution of the source. Since exact statistical knowledge of the source is the exception rather t... |

704 |
Extrapolation, Interpolation and Smoothing of Stationary Time Series
- Wiener
- 1949
(Show Context)
Citation Context ...s of noise, nor had they modeled sources of information probabilistically. Much of the credit for importing random processes into the toolbox of the 1940’s communications engineer is due to N. Wiener =-=[11]-=- 1 and to S. Rice [12]. Probabilistic modeling of information sources has in fact a very long history as a result of its usefulness in cryptography. As early as 1380 and 1658, tables of frequencies of... |

640 |
A mathematical theory of communication,” Bell Syst
- Shannon
- 1948
(Show Context)
Citation Context ...ommunication. Index Terms— Channel capacity, data compression, entropy, history of Information Theory, reliable communication, source coding. CLAUDE Shannon’s “A mathematical theory of communication” =-=[1]-=- published in July and October of 1948 is the Magna Carta of the information age. Shannon’s discovery of the fundamental laws of data compression and transmission marks the birth of Information Theory... |

562 |
A technique for high performance data compression.
- Welch
- 1984
(Show Context)
Citation Context ... and has led to the universal optimal method of F. Willems, Y. Starkov, and T. 16 Rediscovered in [91] and [92]. 17 Practical issues on the implementation of the Lempel–Ziv algorithm are addressed in =-=[99]-=-. 18 Reference [77] gives a survey of the interplay between delay and redundancy for universal source coding with various knowledge of the statistics of the source. Tjalkens [110]. The method of [110]... |

508 |
Three models for the description of language.
- Chomsky
- 1956
(Show Context)
Citation Context ... 3 (and, thus, the achievability part of the source coding theorem) applies to stationary Markov chain sources. In 1953, a step-by-step proof of the generalization of Shannon’s 6 A view challenged in =-=[30]-=- by N. Chomsky, the father of modern linguistics. Theorem 3 to Markov chains was given by A. Khinchin in the first Russian article on information theory [31]. In 1953, B. McMillan [32] used the statis... |

394 | The minimum description length principle in coding and modeling,”
- Barron, Rissanen, et al.
- 1998
(Show Context)
Citation Context ...nce and is the length of the parameter string. The relevance of the information-theoretic MDL principle transcends data compression and is now established as a major approach in statistical inference =-=[95]-=-. The most widely used universal source-coding method is the algorithm introduced by A. Lempel and J. Ziv in slightly different versions in 1976–1978 [96]–[98]. Unlike the methods mentioned so far in ... |

345 |
Universal coding, information, prediction, and estimation.
- Rissanen
- 1984
(Show Context)
Citation Context ...we estimate the distribution (i.e., the more complex the model) the more efficiently we can compress the source, but also the longer it takes to describe the parameter string to the decoder. Rissanen =-=[94]-=- showed that there are fundamental reasons to choose the minimum description length (MDL) criterion for model selection. According to the MDL principle, the parameter string is chosen to minimize the ... |

339 |
Certain Topics in Telegraph Transmission Theory
- Nyquist
- 1928
(Show Context)
Citation Context ...on. Furthermore, he posed the question of how much improvement in telegraphy transmission rate could be achieved by replacing the Morse code by an “optimum” code. K. Küpfmüller [4] (1924), H. Nyquist =-=[5]-=- (1928), and V. Kotel’nikov [6] (1933) studied the maximum telegraph signaling speed sustainable by bandlimited linear systems. Unbeknownst to those authors, E. Whittaker [7] (1915) and J. Whittaker [... |

332 | Entropy and Information Theory.
- Gray
- 1990
(Show Context)
Citation Context ... the test of time with a myriad applications ranging from facsimile [50] to high-definition television [51]. 7 Tutorials on the interplay between information theory and ergodic theory can be found in =-=[35]-=-–[37]. 8 General coding theorems for nonstationary/nonergodic sources can be found in [46].s2060 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 No formula is known for the minim... |

305 |
Run-length encoding.
- Golomb
- 1966
(Show Context)
Citation Context ...ariable codes [75], [76]. Although variable-to-variable source coding has not received as much attention as the other techniques (cf. [77]), it encompasses the popular technique of runlength encoding =-=[78]-=-, already anticipated by Shannon [1], [79], as well as several of the universal coding techniques discussed in the next subsection. E. Universal Source Coding A. Kolmogorov [80] coined the term “unive... |

266 | An Introduction to Arithmetic Coding.
- Langdon
- 1984
(Show Context)
Citation Context ... by the Huffman code also apply to the Shannon–Fano code mentioned in Section II-A. The second shortcoming is circumvented by the arithmetic coding method of J. Rissanen [60] (generalized in [61] and =-=[62]-=- and popularized in [63]), whose philosophy is related to that of the Shannon–Fano code. 14 The use of arithmetic coding is now widespread in the datacompression industry (and, in particular, in image... |

205 |
Information Theory and Coding,
- Abramson
- 1963
(Show Context)
Citation Context ...r approaches the entropy rate hyperbolically in the blocklength [59]. 14 The Shannon–Fano code is frequently referred to as the Shannon–Fano–Elias code, and the arithmetic coding methods described in =-=[64]-=- and [65] are attributed to P. Elias therein. Those attributions are unfounded [66]. In addition to [1], other contributions relevant to the development of modern arithmetic coding are [67] and [68]. ... |

173 | A locally adaptive data compression scheme
- Bentley, Sleator, et al.
- 1986
(Show Context)
Citation Context ... in this direction has its roots in the finite-memory “contexttree” model introduced by Rissanen [109] and has led to the universal optimal method of F. Willems, Y. Starkov, and T. 16 Rediscovered in =-=[91]-=- and [92]. 17 Practical issues on the implementation of the Lempel–Ziv algorithm are addressed in [99]. 18 Reference [77] gives a survey of the interplay between delay and redundancy for universal sou... |

171 | Approximation theory of output statistics,”
- Han, Verdu
- 1993
(Show Context)
Citation Context ...n television [51]. 7 Tutorials on the interplay between information theory and ergodic theory can be found in [35]–[37]. 8 General coding theorems for nonstationary/nonergodic sources can be found in =-=[46]-=-.s2060 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 No formula is known for the minimum average length in terms of the distribution of the source. In [1], Shannon showed that ... |

171 |
The performance of universal encoding,”
- Krichevsky, Trofimov
- 1981
(Show Context)
Citation Context ... been shown to have certain performance advantages over fixed-to-variable codes [75], [76]. Although variable-to-variable source coding has not received as much attention as the other techniques (cf. =-=[77]-=-), it encompasses the popular technique of runlength encoding [78], already anticipated by Shannon [1], [79], as well as several of the universal coding techniques discussed in the next subsection. E.... |

146 |
Variations on a theme by huffman
- Gallager
- 1978
(Show Context)
Citation Context ...g is due to its rational exploitation of source memory by using the conditional probability of the next symbol to be encoded given the observed past. 9 Tighter distribution-dependent bounds are known =-=[52]-=-, [53]. 10 Kraft [54] credits the derivation of the inequality to R. M. Redheffer, who would later coauthor the well-known undergraduate text [55]. 11 Minimum average-length source-coding problems hav... |

144 |
Generalized Kraft Inequality and Arithmetic Coding.
- Rissanen
- 1976
(Show Context)
Citation Context ... Both difficulties encountered by the Huffman code also apply to the Shannon–Fano code mentioned in Section II-A. The second shortcoming is circumvented by the arithmetic coding method of J. Rissanen =-=[60]-=- (generalized in [61] and [62] and popularized in [63]), whose philosophy is related to that of the Shannon–Fano code. 14 The use of arithmetic coding is now widespread in the datacompression industry... |

123 |
Dynamic huffman coding
- Knuth
- 1985
(Show Context)
Citation Context ...dapt to it. The same is true for the decoder because its output is a lossless reconstruction of the source sequence. Adaptive Huffman coding was initially considered in [86] and [52], and modified in =-=[87]-=- and [88]. For 15 cf. Section III-G.sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2061 large-alphabet sources, lower encoding/decoding complexity can be achieved by the adaptive fixed-to-variable source codes... |

113 |
Universal noiseless coding.
- Davisson
- 1973
(Show Context)
Citation Context ...to-variable [81] and variable-to-fixed [82] coding. If the uncertainty on the source distribution can be modeled by a class of distributions, it was shown by B. Fitingof in [83] and by L. Davisson in =-=[84]-=- that for some uncertainty classes there is no asymptotic loss of compression efficiency if we use a source code tuned to the “center of gravity” of the uncertainty set. Constructive methods for vario... |

99 |
On the functions which are represented by the expansion of interpolating theory
- Whittaker
- 1915
(Show Context)
Citation Context ...er [4] (1924), H. Nyquist [5] (1928), and V. Kotel’nikov [6] (1933) studied the maximum telegraph signaling speed sustainable by bandlimited linear systems. Unbeknownst to those authors, E. Whittaker =-=[7]-=- (1915) and J. Whittaker [8] (1929) had found how to interpolate losslessly the sampled values of bandlimited functions. D. Gabor [9] (1946) realized the importance of the duration–bandwidth product a... |

94 |
Mathematical analysis of random noise,” Bell Syst
- Rice
- 1944
(Show Context)
Citation Context ...ey modeled sources of information probabilistically. Much of the credit for importing random processes into the toolbox of the 1940’s communications engineer is due to N. Wiener [11] 1 and to S. Rice =-=[12]-=-. Probabilistic modeling of information sources has in fact a very long history as a result of its usefulness in cryptography. As early as 1380 and 1658, tables of frequencies of letters and pairs of ... |

80 |
Bernoulli shifts with the same entropy are isomorphic
- Ornstein
- 1970
(Show Context)
Citation Context ...on theory was made evident by McMillan in 1953, the key role that entropy plays in ergodic theory was revealed by A. Kolmogorov [33] in 1958 and would eventually culminate in D. Ornstein’s 1970 proof =-=[34]-=- of one of the pillars of modern ergodic theory: the isomorphy theorem. 7 Shannon’s Theorem 3 states that the normalized logprobability of the source string converges in probability as its length goes... |

73 |
A Device for Quantizing, Grouping, and Coding Amplitude Modulated Pulses. MSc thesis,
- Kraft
- 1949
(Show Context)
Citation Context ...bit, 9 but he did not give a lower bound. Before Huffman, another MIT student, L. Kraft, had attacked the construction of minimum redundancy codes unsuccessfully. However, in his 1949 Master’s thesis =-=[54]-=-, Kraft gave a basic condition (known as the Kraft inequality) that must be satisfied by the codeword lengths of a prefix code (i.e., a code where no codeword is the prefix of another). 10 Seven years... |

68 |
Source coding algorithms for fast data compression,”
- Pasco
- 1976
(Show Context)
Citation Context ...countered by the Huffman code also apply to the Shannon–Fano code mentioned in Section II-A. The second shortcoming is circumvented by the arithmetic coding method of J. Rissanen [60] (generalized in =-=[61]-=- and [62] and popularized in [63]), whose philosophy is related to that of the Shannon–Fano code. 14 The use of arithmetic coding is now widespread in the datacompression industry (and, in particular,... |

67 |
Certain factors affecting telegraph speed
- Nyquist
- 1924
(Show Context)
Citation Context ...unication system used to transmit analog continuous-time signals; d) at the expense of reduced fidelity, the bandwidth used by the Vocoder [2] was less than the message bandwidth. In 1924, H. Nyquist =-=[3]-=- argued that the transmission rate is proportional to the logarithm of the number of signal levels in a unit duration. Furthermore, he posed the question of how much improvement in telegraphy transmis... |

63 |
New metric invariants of transitive dynamical systems and automorphisms of Lebesgue spaces,
- Kolmogorov
- 1958
(Show Context)
Citation Context ...theorem. While the fundamental importance of ergodic theory to information theory was made evident by McMillan in 1953, the key role that entropy plays in ergodic theory was revealed by A. Kolmogorov =-=[33]-=- in 1958 and would eventually culminate in D. Ornstein’s 1970 proof [34] of one of the pillars of modern ergodic theory: the isomorphy theorem. 7 Shannon’s Theorem 3 states that the normalized logprob... |

62 | Code and Parse Trees for Lossless Source Encoding,
- Abrahams
- 1998
(Show Context)
Citation Context ...th source-coding problems have been solved with additional constraints such as unequal symbol lengths, infinite alphabets, lexicographic ordering of encoded strings, maximum codeword length, etc. See =-=[58]-=- for a recent survey. 12 As a result of its emphasis on asymptotic stationary settings, Shannon theory has not been engulfed in the Bayesian/non-Bayesian schism that has plagued the field of statistic... |

59 |
The individual ergodic theorem for information theory.
- Breiman
- 1957
(Show Context)
Citation Context ...the source string converges in probability as its length goes to infinity. Although this is enough for most lossless source coding theorems of interest, almost-sure convergence also holds as shown in =-=[38]-=- and (with a simpler proof) in [39]. Generalizations of the Shannon–McMillan theorem to continuous-valued random processes and to other functionals of interest in information theory have been accompli... |

59 |
Probabilistic Information Theory,
- Jelinek
- 1968
(Show Context)
Citation Context ...hes the entropy rate hyperbolically in the blocklength [59]. 14 The Shannon–Fano code is frequently referred to as the Shannon–Fano–Elias code, and the arithmetic coding methods described in [64] and =-=[65]-=- are attributed to P. Elias therein. Those attributions are unfounded [66]. In addition to [1], other contributions relevant to the development of modern arithmetic coding are [67] and [68]. D. Variab... |

59 |
Synthesis of Noiseless Compression Codes.
- Tunstall
- 1967
(Show Context)
Citation Context ...ce into consecutive variable-length phrases. In variable-to-fixed source coding, those phrases belong to a predetermined fixedsize dictionary. Given the size of the dictionary, the Tunstall algorithm =-=[70]-=- selects its entries optimally under the condition that no phrase is the prefix of another and that every source sequence has a prefix in the dictionary. For memoryless sources, the Tunstall algorithm... |

58 |
Enumerative Source Coding,”
- Cover
- 1973
(Show Context)
Citation Context ...n [64] and [65] are attributed to P. Elias therein. Those attributions are unfounded [66]. In addition to [1], other contributions relevant to the development of modern arithmetic coding are [67] and =-=[68]-=-. D. Variable-to-Fixed Source Coding So far we have considered data-compression methods whereby fixed-size blocks of source symbols are encoded into either variable-length or fixed-length strings. The... |

56 |
The basic theorems of information theory.
- McMillan
- 1953
(Show Context)
Citation Context ...w challenged in [30] by N. Chomsky, the father of modern linguistics. Theorem 3 to Markov chains was given by A. Khinchin in the first Russian article on information theory [31]. In 1953, B. McMillan =-=[32]-=- used the statistical-mechanics phrase “asymptotic equipartition property” (AEP) to describe the typicality property of Shannon’s Theorem 3: the set of atypical sequences has vanishing probability. Mo... |

55 |
Two inequalities implied by unique decipherability
- McMillan
- 1956
(Show Context)
Citation Context ...ty) that must be satisfied by the codeword lengths of a prefix code (i.e., a code where no codeword is the prefix of another). 10 Seven years later, and apparently unaware of Kraft’s thesis, McMillan =-=[56]-=- showed that that condition must hold not just for prefix codes but for any uniquely decodable code. (A particularly simple proof was given in [57].) It is immediate to show (McMillan [56] attributes ... |

53 | The strong ergodic theorem for densities: Generalized Shannon-McMillan-Breiman theorem - Barron - 1985 |

51 |
An adaptive system for data compression
- FALLER
- 1913
(Show Context)
Citation Context ...” the source distribution and adapt to it. The same is true for the decoder because its output is a lossless reconstruction of the source sequence. Adaptive Huffman coding was initially considered in =-=[86]-=- and [52], and modified in [87] and [88]. For 15 cf. Section III-G.sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2061 large-alphabet sources, lower encoding/decoding complexity can be achieved by the adaptive... |

47 | Coding theorems for individual sequences
- Ziv
- 1978
(Show Context)
Citation Context ...e is the shortest phrase not seen previously) can be encoded and decoded very easily. 17 Remarkably, the Lempel–Ziv algorithm encodes any stationary ergodic source at its entropy rate as shown by Ziv =-=[100]-=- and Wyner–Ziv [101], [102]. The analysis of the statistical properties of the Lempel–Ziv algorithm has proven to be a fertile research ground [98], [103]–[108]. Despite its optimality and simplicity,... |

43 | Interval and recency Rank source coding: two on-line adaptive variable-length schemes
- Elias
(Show Context)
Citation Context ...direction has its roots in the finite-memory “contexttree” model introduced by Rissanen [109] and has led to the universal optimal method of F. Willems, Y. Starkov, and T. 16 Rediscovered in [91] and =-=[92]-=-. 17 Practical issues on the implementation of the Lempel–Ziv algorithm are addressed in [99]. 18 Reference [77] gives a survey of the interplay between delay and redundancy for universal source codin... |

39 |
A sandwich proof of the ShannonMcmillan-Breiman theorem.
- Algoet, Cover
- 1988
(Show Context)
Citation Context ...ability as its length goes to infinity. Although this is enough for most lossless source coding theorems of interest, almost-sure convergence also holds as shown in [38] and (with a simpler proof) in =-=[39]-=-. Generalizations of the Shannon–McMillan theorem to continuous-valued random processes and to other functionals of interest in information theory have been accomplished in [40]–[45]. Sources that are... |

36 |
International digital facsimile coding standards,” proc,
- Hunter, Robinson
- 1980
(Show Context)
Citation Context ...th of a homework problem assigned in R. Fano’s MIT information theory class [49]. The practicality of the Huffman code has withstood the test of time with a myriad applications ranging from facsimile =-=[50]-=- to high-definition television [51]. 7 Tutorials on the interplay between information theory and ergodic theory can be found in [35]–[37]. 8 General coding theorems for nonstationary/nonergodic source... |

33 |
The sliding-window Lempel-Ziv algorithm is asymptotically optimal
- Wyner, Ziv
- 1994
(Show Context)
Citation Context ...t seen previously) can be encoded and decoded very easily. 17 Remarkably, the Lempel–Ziv algorithm encodes any stationary ergodic source at its entropy rate as shown by Ziv [100] and Wyner–Ziv [101], =-=[102]-=-. The analysis of the statistical properties of the Lempel–Ziv algorithm has proven to be a fertile research ground [98], [103]–[108]. Despite its optimality and simplicity, the Lempel–Ziv algorithm i... |

30 |
The Fourier theory of the cardinal functions,” in
- Whittaker
- 1929
(Show Context)
Citation Context ...] (1928), and V. Kotel’nikov [6] (1933) studied the maximum telegraph signaling speed sustainable by bandlimited linear systems. Unbeknownst to those authors, E. Whittaker [7] (1915) and J. Whittaker =-=[8]-=- (1929) had found how to interpolate losslessly the sampled values of bandlimited functions. D. Gabor [9] (1946) realized the importance of the duration–bandwidth product and proposed a time–frequency... |

28 |
Codes based on inaccurate source probabilities,”
- Gilbert
- 1971
(Show Context)
Citation Context ...ncy. For memoryless sources, the increase in rate for compressing assuming distribution when the true source distribution is is equal to the divergence15 of with respect to for both fixed-to-variable =-=[81]-=- and variable-to-fixed [82] coding. If the uncertainty on the source distribution can be modeled by a class of distributions, it was shown by B. Fitingof in [83] and by L. Davisson in [84] that for so... |

24 |
Transmission of information,” Bell Syst
- Hartley
- 1928
(Show Context)
Citation Context ...the sampled values of bandlimited functions. D. Gabor [9] (1946) realized the importance of the duration–bandwidth product and proposed a time–frequency uncertainty principle. R. Hartley’s 1928 paper =-=[10]-=- uses terms such as “rate of communication,” “intersymbol interference,” and “capacity of a system to transmit information.” He summarizes his main accomplishment as the point of view developed is use... |

24 |
Generalized Tunstall codes for sources with memory
- Savari, Gallager
- 1997
(Show Context)
Citation Context .... Further results on the behavior of the Tunstall algorithm for memoryless sources have been obtained in [71] and [72]. For Markov sources, optimal variable-to-fixed codes have been found in [73] and =-=[74]-=-. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable codes [75], [76]. Although variable-to-variable source coding has not received as much attention... |

22 |
On the transmission capacity of ”ether” and wire
- Kotelńikov
- 1933
(Show Context)
Citation Context ...uestion of how much improvement in telegraphy transmission rate could be achieved by replacing the Morse code by an “optimum” code. K. Küpfmüller [4] (1924), H. Nyquist [5] (1928), and V. Kotel’nikov =-=[6]-=- (1933) studied the maximum telegraph signaling speed sustainable by bandlimited linear systems. Unbeknownst to those authors, E. Whittaker [7] (1915) and J. Whittaker [8] (1929) had found how to inte... |

22 | The interactions between ergodic theory and information theory,” Information Theory
- Shields
- 1998
(Show Context)
Citation Context ...test of time with a myriad applications ranging from facsimile [50] to high-definition television [51]. 7 Tutorials on the interplay between information theory and ergodic theory can be found in [35]–=-=[37]-=-. 8 General coding theorems for nonstationary/nonergodic sources can be found in [46].s2060 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 No formula is known for the minimum av... |

21 |
Universal Compression and Retrieval
- Krichevsky
- 1994
(Show Context)
Citation Context ...e Bayesian/non-Bayesian schism that has plagued the field of statistics. 13 For most Markov sources the minimum average length per letter approaches the entropy rate hyperbolically in the blocklength =-=[59]-=-. 14 The Shannon–Fano code is frequently referred to as the Shannon–Fano–Elias code, and the arithmetic coding methods described in [64] and [65] are attributed to P. Elias therein. Those attributions... |

21 |
of individual sequences via variable-rate coding,” Information Theory
- “Compression
- 1978
(Show Context)
Citation Context ... major approach in statistical inference [95]. The most widely used universal source-coding method is the algorithm introduced by A. Lempel and J. Ziv in slightly different versions in 1976–1978 [96]–=-=[98]-=-. Unlike the methods mentioned so far in this subsection, the Lempel–Ziv algorithm is not based on approximating or estimating the source distribution. Like variable-to-fixed source coding, Lempel–Ziv... |

19 | The Early Days of Information Theory - Pierce - 1973 |

19 |
Variable-length binary encodings,” Bell Syst
- Gilbert, Moore
- 1959
(Show Context)
Citation Context ...scribed in [64] and [65] are attributed to P. Elias therein. Those attributions are unfounded [66]. In addition to [1], other contributions relevant to the development of modern arithmetic coding are =-=[67]-=- and [68]. D. Variable-to-Fixed Source Coding So far we have considered data-compression methods whereby fixed-size blocks of source symbols are encoded into either variable-length or fixed-length str... |

18 |
Universal data compression and repetition times
- Willems
- 1989
(Show Context)
Citation Context ...ic source at its entropy rate as shown by Ziv [100] and Wyner–Ziv [101], [102]. The analysis of the statistical properties of the Lempel–Ziv algorithm has proven to be a fertile research ground [98], =-=[103]-=-–[108]. Despite its optimality and simplicity, the Lempel–Ziv algorithm is not the end of the story in universal source coding. Prior knowledge of general structural properties of the source can be ex... |

17 |
Image and video coding standards
- Aravind, Cash, et al.
- 1993
(Show Context)
Citation Context ...), whose philosophy is related to that of the Shannon–Fano code. 14 The use of arithmetic coding is now widespread in the datacompression industry (and, in particular, in image and video applications =-=[69]-=-). Much of the success of arithmetic coding is due to its rational exploitation of source memory by using the conditional probability of the next symbol to be encoded given the observed past. 9 Tighte... |

15 | Dynamic Huffman Coding",
- Vitter
- 1989
(Show Context)
Citation Context ...t. The same is true for the decoder because its output is a lossless reconstruction of the source sequence. Adaptive Huffman coding was initially considered in [86] and [52], and modified in [87] and =-=[88]-=-. For 15 cf. Section III-G.sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2061 large-alphabet sources, lower encoding/decoding complexity can be achieved by the adaptive fixed-to-variable source codes of B. Ry... |

13 |
On the complexity of individual sequences
- Lempel, Ziv
- 1976
(Show Context)
Citation Context ... as a major approach in statistical inference [95]. The most widely used universal source-coding method is the algorithm introduced by A. Lempel and J. Ziv in slightly different versions in 1976–1978 =-=[96]-=-–[98]. Unlike the methods mentioned so far in this subsection, the Lempel–Ziv algorithm is not based on approximating or estimating the source distribution. Like variable-to-fixed source coding, Lempe... |

11 |
Variable to fixed-length codes for Markov sources
- Tjalkens, Willems
- 1987
(Show Context)
Citation Context ...d phrases. Further results on the behavior of the Tunstall algorithm for memoryless sources have been obtained in [71] and [72]. For Markov sources, optimal variable-to-fixed codes have been found in =-=[73]-=- and [74]. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable codes [75], [76]. Although variable-to-variable source coding has not received as much ... |

9 |
Probability, likelihood, and quantity of information in the logic of uncertain inference
- Fisher
- 1933
(Show Context)
Citation Context ...ndently of Shannon, the differential entropy [27] which he used in the context of Gaussian random variables. A distant relative of the differential entropy dating back to 1934 is Fisher’s information =-=[28]-=-, which gives a fundamental limit on the achievable mean-square error of parametric estimation. (1) (2)sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2059 per source symbol is achievable provided we are willin... |

9 | The role of the asymptotic equipartition property in noiseless source coding,”
- Verdu, Han
- 1997
(Show Context)
Citation Context ...[40]–[45]. Sources that are either nonstationary or nonergodic need not satisfy Theorem 3 8 ; that is, some sources require less than the entropy rate to be encoded, some require more. It is shown in =-=[47]-=- that the AEP is not only sufficient but necessary for the validity of the source coding theorem (in the general setting of finite-alphabet sources with nonzero entropy). Furthermore, [47] shows that ... |

8 | A Simple Proof of the Moy-Perez Generalization of the Shannon-McMillan Theorem - Kieffer - 1974 |

8 |
Variable-to-fixed length codes provide better large deviations performance than fixed-to-variable length codes
- Merhav, Neuhoff
- 1992
(Show Context)
Citation Context ...For Markov sources, optimal variable-to-fixed codes have been found in [73] and [74]. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable codes [75], =-=[76]-=-. Although variable-to-variable source coding has not received as much attention as the other techniques (cf. [77]), it encompasses the popular technique of runlength encoding [78], already anticipate... |

8 |
Tunstall adaptive coding and miscoding
- Fabris, Sgarro, et al.
- 1996
(Show Context)
Citation Context ..., the increase in rate for compressing assuming distribution when the true source distribution is is equal to the divergence15 of with respect to for both fixed-to-variable [81] and variable-to-fixed =-=[82]-=- coding. If the uncertainty on the source distribution can be modeled by a class of distributions, it was shown by B. Fitingof in [83] and by L. Davisson in [84] that for some uncertainty classes ther... |

7 |
Über Einschwingvorgange in Wellen Filtern
- Küpfmuller
- 1924
(Show Context)
Citation Context ...levels in a unit duration. Furthermore, he posed the question of how much improvement in telegraphy transmission rate could be achieved by replacing the Morse code by an “optimum” code. K. Küpfmüller =-=[4]-=- (1924), H. Nyquist [5] (1928), and V. Kotel’nikov [6] (1933) studied the maximum telegraph signaling speed sustainable by bandlimited linear systems. Unbeknownst to those authors, E. Whittaker [7] (1... |

7 |
A mathematical theory of cryptography
- Shannon
- 1945
(Show Context)
Citation Context ...rs and pairs of letters, respectively, had been compiled for the purpose of decrypting secret messages [13]. 2 At the conclusion of his WWII work on cryptography, Shannon prepared a classified report =-=[14]-=- 3 where he included several of the notions (including entropy and the phrase “information theory”) pioneered in [1] (cf. [16]). However, Shannon had started his work on information theory (and, in pa... |

7 |
A conversation with Claude Shannon
- Price
- 1984
(Show Context)
Citation Context ...on of his WWII work on cryptography, Shannon prepared a classified report [14] 3 where he included several of the notions (including entropy and the phrase “information theory”) pioneered in [1] (cf. =-=[16]-=-). However, Shannon had started his work on information theory (and, in particular, on probabilistic modeling of information sources) well before his involvement with cryptography. 4 Having read Hartl... |

7 |
Die entstehung von informationskonzepten in der nachrichtentechnik: Eine fallstudie zur theoriebildung in der technik in industrie- und kriegsforschung, Doctoral dissertation
- Hagemeyer
- 1979
(Show Context)
Citation Context ...ation of the general proportion between the numbers of particles, nouns and verbs.” 3 Later declassified and superseded by [1] and [15]. 4 According to interviews with Claude Shannon recorded in [16]–=-=[18]-=-. We can think of a discrete source as generating the message, symbol by symbol. It chooses successive symbols according to certain probabilities depending, in general, on preceding choices as well as... |

7 |
Theoretical limitation on the rate of transmission of inforamtion
- Tuller
- 1949
(Show Context)
Citation Context ...nized by various researchers. Several theories and principles were put forth in the space of a few months by A. Clavier [22], C. Earp [23], S. Goldman [24], J. Laplume [25], C. Shannon [1], W. Tuller =-=[26]-=-, and N. Wiener [27]. One of those theories would prove to be everlasting. II. LOSSLESS DATA COMPRESSION A. The Birth of Data Compression The viewpoint established by Hartley [10] and Wiener [11] is e... |

7 |
The Grand Alliance system for
- Challapali, Lebegue, et al.
- 1995
(Show Context)
Citation Context ...n R. Fano’s MIT information theory class [49]. The practicality of the Huffman code has withstood the test of time with a myriad applications ranging from facsimile [50] to high-definition television =-=[51]-=-. 7 Tutorials on the interplay between information theory and ergodic theory can be found in [35]–[37]. 8 General coding theorems for nonstationary/nonergodic sources can be found in [46].s2060 IEEE T... |

7 |
A Universal Variable-to-Fixed Length Source Code Based on Lawrence’s Algorithm
- Tjalkens, Willems
- 1992
(Show Context)
Citation Context ...ertainty set. Constructive methods for various restricted classes of sources (such as memoryless and Markov) have been proposed by R. Krichevsky and V. Trofimov [59] and by T. Tjalkens and F. Willems =-=[85]-=-. In universal source coding, the encoder can exploit the fact that it observes the source output and, thus, can “learn” the source distribution and adapt to it. The same is true for the decoder becau... |

6 |
Beziehung zwischen dem Zweiten Hauptsatze der Mechanischen Waermertheorie und der Wahrscheilichkeitsrechnung Respektive den Saetzen uber das Waermegleichgwicht
- Boltzmann
(Show Context)
Citation Context ...kle that problem, he considers a single random variable taking values with probabilities and defines its entropy: 5 Shannon points out the similarity with Boltzmann’s entropy in statistical mechanics =-=[29]-=- and gives an axiomatic rationale for this measure of information, as the only measure that is i) continuous in the probabilities, ii) increasing with if the random variable is equiprobable, and iii) ... |

6 |
On the Shannon-Perez-Moy theorem
- Orey
- 1985
(Show Context)
Citation Context ...simpler proof) in [39]. Generalizations of the Shannon–McMillan theorem to continuous-valued random processes and to other functionals of interest in information theory have been accomplished in [40]–=-=[45]-=-. Sources that are either nonstationary or nonergodic need not satisfy Theorem 3 8 ; that is, some sources require less than the entropy rate to be encoded, some require more. It is shown in [47] that... |

6 | Variable-to-Fixed Length Codes are Better than Fixed-to-Variable Length Codes for Markov Sources
- Ziv
- 1990
(Show Context)
Citation Context ...[72]. For Markov sources, optimal variable-to-fixed codes have been found in [73] and [74]. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable codes =-=[75]-=-, [76]. Although variable-to-variable source coding has not received as much attention as the other techniques (cf. [77]), it encompasses the popular technique of runlength encoding [78], already anti... |

5 |
Secret and Urgent
- Pratt
- 1939
(Show Context)
Citation Context ...sult of its usefulness in cryptography. As early as 1380 and 1658, tables of frequencies of letters and pairs of letters, respectively, had been compiled for the purpose of decrypting secret messages =-=[13]-=-. 2 At the conclusion of his WWII work on cryptography, Shannon prepared a classified report [14] 3 where he included several of the notions (including entropy and the phrase “information theory”) pio... |

5 |
A simple proof of an inequality of McMillan
- Karush
- 1961
(Show Context)
Citation Context ..., and apparently unaware of Kraft’s thesis, McMillan [56] showed that that condition must hold not just for prefix codes but for any uniquely decodable code. (A particularly simple proof was given in =-=[57]-=-.) It is immediate to show (McMillan [56] attributes this observation to J. L. Doob) that the average length of any code that satisfies the Kraft inequality cannot be less than the source entropy. Thi... |

4 | The Vocoder, Bell Labs - Dudley - 1939 |

4 |
The entropy concept in probability theory. Uspekhi Matematicheskikj Nauk. 8, 3–20. English translation
- Khinchin
- 1953
(Show Context)
Citation Context ...zation of Shannon’s 6 A view challenged in [30] by N. Chomsky, the father of modern linguistics. Theorem 3 to Markov chains was given by A. Khinchin in the first Russian article on information theory =-=[31]-=-. In 1953, B. McMillan [32] used the statistical-mechanics phrase “asymptotic equipartition property” (AEP) to describe the typicality property of Shannon’s Theorem 3: the set of atypical sequences ha... |

4 |
Generalizations of Shannon–McMillan theorem
- Moy
- 1961
(Show Context)
Citation Context ...th a simpler proof) in [39]. Generalizations of the Shannon–McMillan theorem to continuous-valued random processes and to other functionals of interest in information theory have been accomplished in =-=[40]-=-–[45]. Sources that are either nonstationary or nonergodic need not satisfy Theorem 3 8 ; that is, some sources require less than the entropy rate to be encoded, some require more. It is shown in [47]... |

4 |
Variations on a theme by Gallager,” Image and Text
- Capocelli, Santis
(Show Context)
Citation Context ...ue to its rational exploitation of source memory by using the conditional probability of the next symbol to be encoded given the observed past. 9 Tighter distribution-dependent bounds are known [52], =-=[53]-=-. 10 Kraft [54] credits the derivation of the inequality to R. M. Redheffer, who would later coauthor the well-known undergraduate text [55]. 11 Minimum average-length source-coding problems have been... |

4 |
A fast on-line adaptive code
- Ryabko
- 1992
(Show Context)
Citation Context ... 15 cf. Section III-G.sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2061 large-alphabet sources, lower encoding/decoding complexity can be achieved by the adaptive fixed-to-variable source codes of B. Ryabko =-=[89]-=-, [90]. 16 Showing experimental promise, the nonprobabilistic sorting method of [93] preprocesses sources with memory so that universal codes for memoryless sources achieve good compression efficiency... |

3 |
symbolic analysis of relay and switching circuits
- “A
- 1938
(Show Context)
Citation Context ...aking abstraction of the communication process subject to a mean-square fidelity criterion [19]. After writing his landmark Master’s thesis on the application of Boole’s algebra to switching circuits =-=[20]-=- and his Ph.D. dissertation on population dynamics [21], Shannon returned to communication theory upon joining the Institute for Advanced Study at Princeton and, then, Bell Laboratories in 1941 [16]. ... |

3 |
A G. Evaluation of transmission efficiency according to Hartley’s expression of information content
- Clavier
- 1948
(Show Context)
Citation Context ...f transmission rate, reliability, bandwidth, and signal-to-noise ratio was recognized by various researchers. Several theories and principles were put forth in the space of a few months by A. Clavier =-=[22]-=-, C. Earp [23], S. Goldman [24], J. Laplume [25], C. Shannon [1], W. Tuller [26], and N. Wiener [27]. One of those theories would prove to be everlasting. II. LOSSLESS DATA COMPRESSION A. The Birth of... |

3 |
Relationship Between Rate of Transmission of Information, Frequency Bandwidth and Signal-Noise Ratio
- Earp
- 1948
(Show Context)
Citation Context ... rate, reliability, bandwidth, and signal-to-noise ratio was recognized by various researchers. Several theories and principles were put forth in the space of a few months by A. Clavier [22], C. Earp =-=[23]-=-, S. Goldman [24], J. Laplume [25], C. Shannon [1], W. Tuller [26], and N. Wiener [27]. One of those theories would prove to be everlasting. II. LOSSLESS DATA COMPRESSION A. The Birth of Data Compress... |

3 |
Some Fundamental Considerations Concerning Noise Reduction and Range
- Goldman
- 1948
(Show Context)
Citation Context ...y, bandwidth, and signal-to-noise ratio was recognized by various researchers. Several theories and principles were put forth in the space of a few months by A. Clavier [22], C. Earp [23], S. Goldman =-=[24]-=-, J. Laplume [25], C. Shannon [1], W. Tuller [26], and N. Wiener [27]. One of those theories would prove to be everlasting. II. LOSSLESS DATA COMPRESSION A. The Birth of Data Compression The viewpoint... |

3 |
le nombre de signaux discernables en présence du bruit erratique dans un système de transmission à bande passante limitée
- Laplume
- 1948
(Show Context)
Citation Context ... signal-to-noise ratio was recognized by various researchers. Several theories and principles were put forth in the space of a few months by A. Clavier [22], C. Earp [23], S. Goldman [24], J. Laplume =-=[25]-=-, C. Shannon [1], W. Tuller [26], and N. Wiener [27]. One of those theories would prove to be everlasting. II. LOSSLESS DATA COMPRESSION A. The Birth of Data Compression The viewpoint established by H... |

3 | The Shannon–McMillan–Breiman theorem for amenable groups - Ornstein, Weiss - 1983 |

2 |
Letter to Vannevar Bush Feb. 16, 1939,” in Claude Elwood Shannon Collected Papers
- Shannon
- 1993
(Show Context)
Citation Context ... his undergraduate days, Shannon, as a twenty-two-year-old graduate student at MIT, came up with a ground-breaking abstraction of the communication process subject to a mean-square fidelity criterion =-=[19]-=-. After writing his landmark Master’s thesis on the application of Boole’s algebra to switching circuits [20] and his Ph.D. dissertation on population dynamics [21], Shannon returned to communication ... |

2 |
algebra for theoretical genetics
- “An
- 1940
(Show Context)
Citation Context ...to a mean-square fidelity criterion [19]. After writing his landmark Master’s thesis on the application of Boole’s algebra to switching circuits [20] and his Ph.D. dissertation on population dynamics =-=[21]-=-, Shannon returned to communication theory upon joining the Institute for Advanced Study at Princeton and, then, Bell Laboratories in 1941 [16]. By 1948 the need for a theory of communication encompas... |

2 | Information theory and ergodic theory,” Probl - Csiszár - 1987 |

2 | Information and information stability of ergodic sources,” Probl - Marton - 1972 |

2 |
Claude Shannon lecture: Application of transforms to coding and related topics
- Reed
- 1982
(Show Context)
Citation Context ...coding theorem. The variable-length source code that minimizes average length was obtained by D. Huffman [48], as an outgrowth of a homework problem assigned in R. Fano’s MIT information theory class =-=[49]-=-. The practicality of the Huffman code has withstood the test of time with a myriad applications ranging from facsimile [50] to high-definition television [51]. 7 Tutorials on the interplay between in... |

2 |
personal communication
- Elias
(Show Context)
Citation Context ...n–Fano code is frequently referred to as the Shannon–Fano–Elias code, and the arithmetic coding methods described in [64] and [65] are attributed to P. Elias therein. Those attributions are unfounded =-=[66]-=-. In addition to [1], other contributions relevant to the development of modern arithmetic coding are [67] and [68]. D. Variable-to-Fixed Source Coding So far we have considered data-compression metho... |

2 |
The estimation of redundancy for coding the messages generated by a Bernoulli source,” Probl
- Khodak
- 1972
(Show Context)
Citation Context ... memoryless sources, the Tunstall algorithm maximizes the expected length of the parsed phrases. Further results on the behavior of the Tunstall algorithm for memoryless sources have been obtained in =-=[71]-=- and [72]. For Markov sources, optimal variable-to-fixed codes have been found in [73] and [74]. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable c... |

2 |
On variable-length to block coding
- Jelinek, Schneider
- 1972
(Show Context)
Citation Context ...ss sources, the Tunstall algorithm maximizes the expected length of the parsed phrases. Further results on the behavior of the Tunstall algorithm for memoryless sources have been obtained in [71] and =-=[72]-=-. For Markov sources, optimal variable-to-fixed codes have been found in [73] and [74]. Variable-to-fixed codes have been shown to have certain performance advantages over fixed-to-variable codes [75]... |

2 |
Efficient coding of a binary source with one very infrequent symbol
- Shannon
- 1954
(Show Context)
Citation Context ...e-to-variable source coding has not received as much attention as the other techniques (cf. [77]), it encompasses the popular technique of runlength encoding [78], already anticipated by Shannon [1], =-=[79]-=-, as well as several of the universal coding techniques discussed in the next subsection. E. Universal Source Coding A. Kolmogorov [80] coined the term “universal” to refer to data-compression algorit... |

2 |
Optimal encoding with unknown and variable message statistics,” Probl
- Fitingof
- 1966
(Show Context)
Citation Context ... respect to for both fixed-to-variable [81] and variable-to-fixed [82] coding. If the uncertainty on the source distribution can be modeled by a class of distributions, it was shown by B. Fitingof in =-=[83]-=- and by L. Davisson in [84] that for some uncertainty classes there is no asymptotic loss of compression efficiency if we use a source code tuned to the “center of gravity” of the uncertainty set. Con... |

2 |
compression by means of a book stack,” Probl
- “Data
- 1980
(Show Context)
Citation Context .... Section III-G.sVERDÚ: FIFTY YEARS OF SHANNON THEORY 2061 large-alphabet sources, lower encoding/decoding complexity can be achieved by the adaptive fixed-to-variable source codes of B. Ryabko [89], =-=[90]-=-. 16 Showing experimental promise, the nonprobabilistic sorting method of [93] preprocesses sources with memory so that universal codes for memoryless sources achieve good compression efficiency. Supp... |