## Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts (1993)

Venue: | Cryptologia |

Citations: | 12 - 2 self |

### BibTeX

@ARTICLE{Ganesan93statisticaltechniques,

author = {Ganesan},

title = {Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts},

journal = {Cryptologia},

year = {1993},

volume = {4},

pages = {321--366}

}

### Years of Citing Articles

### OpenURL

### Abstract

We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four welldefined language-recognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0th-order noise from unknown 1st-order language, and 4) detecting non-uniform unknown language. For the second problem we give a most powerful test based on the Neyman-Pearson Lemma. For the oth...

### Citations

4510 | A tutorial on hidden markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...Language Recognition---February 25, 1993 23 it outputs `Z', and with probability 0.1 it ouputs a randomly chosen letter of the alphabet. Trivially, each Markov model is also a degenerate HMM. Rabiner =-=[72]-=- reviews the hidden Markov model and its applications to speech recognition, including an explanation of the forward-backward procedure for computing the likelihood of an observed plaintext given a pa... |

3173 |
Probability and Measure
- Billingsley
- 1986
(Show Context)
Citation Context ...uivalence classes. For example, one 11 Unless otherwise noted, we shall use the phrase Markov chain to mean a finite stationary Markov process. For a review of Markov chains, see Bhat [9], Billingsly =-=[12]-=-, or Kemeny [56]. 12 Throughout this paper, we denote the set of positive integers by the symbol Z + . 13 A Markov chain is ergodic if and only if it is stationary, recurrent, and aperiodic---see Bhat... |

1957 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ...sfied. Thus, a robust test for recognizing Markov English should still work well when applied to real English. It is possible to construct formal models of robustness classes---for example, see Huber =-=[50]-=- and Poor [71]. Ganesan and Sherman, Statistical Techniques for Language Recognition---February 25, 1993 15 7 Test Statistics and Decision Procedures: Solutions to Problems 1--4 We now present solutio... |

1516 |
Information Theory and Reliable Communication
- Gallager
- 1968
(Show Context)
Citation Context ...r more known languages. 17 For a formal definition of when two statistics are equivalent, see Lehmann [64, p. 43]. 18 Entropy is a measure of uncertainty, expressed in bits---for details see Gallager =-=[28]-=-. Shannon [80] measured the entropy of printed English experimentally and found that it is approximately 1.0 bit/character. Ganesan and Sherman, Statistical Techniques for Language Recognition---Febru... |

842 |
Communication theory of secrecy systems
- Shannon
- 1948
(Show Context)
Citation Context ...n-theoretic requirement, we assume that the length of the ciphertext equals or exceeds the unicity distance of the cipher. 10 9 In our experimental work [29] we use k = 2. 10 As formalized by Shannon =-=[79]-=-, the unicity distance of a cipher is the minimum number of ciphertext characters required to guarantee that, for any cryptogram, the expected number of spurious decipherments is approximately zero. G... |

694 |
An Introduction to Signal Detection and Estimation
- Poor
- 1994
(Show Context)
Citation Context ... robust test for recognizing Markov English should still work well when applied to real English. It is possible to construct formal models of robustness classes---for example, see Huber [50] and Poor =-=[71]-=-. Ganesan and Sherman, Statistical Techniques for Language Recognition---February 25, 1993 15 7 Test Statistics and Decision Procedures: Solutions to Problems 1--4 We now present solutions to each of ... |

517 | Cryptography and Data Security
- Denning
- 1982
(Show Context)
Citation Context ...analysis, submarine detection, and speaker identification. Therefore, we expect our work to 1 We assume the reader is familiar with basics of cryptology---as explained by Beker and Piper [8], Denning =-=[20]-=-, Rivest [76], or Simmons [81], for example. We also assume the reader is familiar with elementary statistics---as explained by Hoel [48] or Larsen and Marx [62]. 2 We use the term k-gram to refer to ... |

496 |
Testing Statistical Hypotheses
- Lehmann, Romano
- 2005
(Show Context)
Citation Context ...d normal random variable, rejecting all values that fall outside some interval [\Gamma�� ; �� ]. 8 In solving the Decipher Puzzle, they arbitrarily selected �� = 4. Although the central li=-=mit theorem [64]-=- guarantees thatsS 2 is asymptotically standard normal when applied to independent English bigrams (the special case of their application), Baldwin and Sherman say nothing about the distribution ofsS ... |

359 |
Prediction and entropy of printed english
- Shannon
- 1951
(Show Context)
Citation Context ...nguage statistics, such as those reported by Solso, Juel, King, and Rubin [83, 84, 85, 86]. For additional sources of language statistics, see Shannon's study of the entropy and redundancy of English =-=[80]-=-, Good's survey of the statistics of language [34, pp. 577-- 578], and cryptology texts by Beker and Piper [8], Denning [20], Friedman [26], and Kullbach [60]. In addition to computing unigram, bigram... |

263 |
Speaker-independent phone recognition using hidden Markov models
- Lee, Hon
- 1989
(Show Context)
Citation Context ...researchers have extensively used the hidden Markov model (HMM), which we discuss in Section 8.1. For example, Tishby [87] applies a HMM to identify speakers. Ljolje and Levinson [65] and Lee and Hon =-=[63]-=- also apply this model to speech-recognition problems. Using Bayesian techniques, Raviv [73] developed a program to recognize printed characters in legal text, and Valiveti and Oommen [91] present alg... |

199 |
Discrete Multivariate Analysis
- Bishop, Fienberg, et al.
- 1975
(Show Context)
Citation Context ...bserved k-gram frequencies) that can be analyzed under various assumptions about dependencies among the data. Agresti [1] reviews exact techniques for such problems, and Bishop, Fienberg, and Holland =-=[13]-=- provide a basic introduction to the related area of discrete multivariate analysis. McCloskey and Pittenger [68] give closed-form expressions for maximum-likelihood estimates that arise from testing ... |

199 |
Modulation Theory
- Trees, Detection
- 1968
(Show Context)
Citation Context ... presence of such noise. The problem of recognizing plaintext in the presence of noise is a discrete version of the signaldetection problem, which has been extensively studied. For example, van Trees =-=[88, 89]-=- gives a thorough engineering treatment of this problem, and Osteyee and Good [70] discuss the problem from the point of view of information theory and the weight of evidence. In the rest of this sect... |

163 |
Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions
- Self, Liang
- 1987
(Show Context)
Citation Context ...t level of robustness? From a statistical perspective, part of this question deals with how to treat situations in which parameters lie on the boundary of the parameter space (e.g. see Self and Liang =-=[78]-=-)---but the traditional statistical model is not necessarily the most useful model for harnessing the power of negative deductions. Finally, a fundamental challenge is to develop and to extend the the... |

161 |
The estimation of probabilities: An essay on modern bayesian methods
- Good
- 1965
(Show Context)
Citation Context ...ting the estimated parameters, it is possible to create an equivalent effect by appropriately modifying the test statistics. For a discussion and comparison of various flattening techniques, see Good =-=[39, 40]-=-. Language Statistics for Natural Languages To carry out many tests described in this paper, it is necessary to know transition probabilities of the base language. Some practitioners may choose to est... |

153 |
Mathematical Statistics
- Wilks
- 1962
(Show Context)
Citation Context ...istic refers to the random variable /; the term test refers the entire testing process. 15 For a more detailed explanation of statistical hypothesis testing, see Lehmann [64], Rohatgi [77], and Wilks =-=[93]-=-. Ganesan and Sherman, Statistical Techniques for Language Recognition---February 25, 1993 13 Simple versus Compound Hypotheses We assume the distribution of / belongs to some known class of parameter... |

145 | Probability and Statistics with Reliability - Trivedi - 2002 |

118 |
A probabilistic distance measure for hidden Markov models
- Juang, Rabiner
- 1985
(Show Context)
Citation Context ... and its applications to speech recognition, including an explanation of the forward-backward procedure for computing the likelihood of an observed plaintext given a particular HMM. Juang and Rabiner =-=[51]-=- develop a distance measure for this model. As a recent practical example, Krishnamurthy, Moore, and Chung [58], who to enhance the performance of a biomedical instrument, develop maximum-likelihood t... |

108 |
Probability and the Weighing of Evidence
- Good
- 1950
(Show Context)
Citation Context ...spectively, the restrictions of H 0 and H 1 to parameters ` 0 and ` 1 . This ratio is sometimes called the weight of evidence in favor of H 0 as opposed to H 1 , provided by X ; for example, see Good =-=[32, 38, 70]-=-. For Problem 2, whose hypotheses are simple, this construction yields a most powerful test by the Neyman-Pearson Lemma. For Problems 1, 3, and 4, whose alternative hypotheses are compound, we compute... |

100 | A Survey of Exact Inference for Contingency Tables
- Agresti
- 1992
(Show Context)
Citation Context ... known transition probabilities contains exactly d zeroes. For this situation, we assume that the parameter space\Omega H 1 for the alternative hypothesis H 1 consists of all m \Theta m matrices over =-=[0; 1]-=-, different from PB , that contain exactly d zeroes and whose zeroes appear in the same positions as do the zeroes in PB . Similarly, for the other problems, we assume that the zeroes in the matrices ... |

96 |
Good Thinking: The Foundations of Probability and Its Applications
- Good
- 1983
(Show Context)
Citation Context ... including tests for 0th order, stationarity, and specific chains. A review of prior statistics research on language recognition would be incomplete without mentioning the prolific work of I. J. Good =-=[35, 37]-=-, who was Turing's chief statistical assistant during World War II. In his work on likelihood ratio tests [30] and on the frequency counts of Markov chains [31], Good refines some of the seminal resul... |

84 |
Statistical inference about Markov chains
- Anderson, Goodman
- 1958
(Show Context)
Citation Context ...tationarysMarkov chain, determining the order of the chain that produced X , or classifying X among two or more known languages. For details about these addition problems, see T. Anderson and Goodman =-=[4]-=-, Bhat [9], or Billingsly [11]. 5.2 Examples We present four examples to illustrate how the foregoing language-recognition problems arise in cryptanalytic practice. For each situation, the cryptanalys... |

76 | Codebreakers. The Story of Secret Writing - Kahn, \The - 1967 |

73 |
Statistical Inference for Markov Processes
- Billingsley
- 1961
(Show Context)
Citation Context ...mining the order of the chain that produced X , or classifying X among two or more known languages. For details about these addition problems, see T. Anderson and Goodman [4], Bhat [9], or Billingsly =-=[11]-=-. 5.2 Examples We present four examples to illustrate how the foregoing language-recognition problems arise in cryptanalytic practice. For each situation, the cryptanalyst selects one of the problems ... |

71 |
Introduction to Mathematical Statistics
- Hoel
- 1971
(Show Context)
Citation Context ...s of cryptology---as explained by Beker and Piper [8], Denning [20], Rivest [76], or Simmons [81], for example. We also assume the reader is familiar with elementary statistics---as explained by Hoel =-=[48]-=- or Larsen and Marx [62]. 2 We use the term k-gram to refer to any sequence of exactly k letters. For k = 1; 2; 3, we call any such gram a unigram, bigram, or trigram, respectively. Ganesan and Sherma... |

63 |
Fundamental Algorithms, The Art of
- Knuth
- 1973
(Show Context)
Citation Context ... apply. Good and Crook [18, 19, 41] analyze the G statistic and compare it with the X 2 and log-likelihood statistics and with a Bayes factor F . In his study of pseudorandom number generators, Knuth =-=[57]-=- describes many tests for nonrandomness, including his so-called spectral test which interprets frequency counts in a geometric fashion. Knuth [57, p. 89] asserts, that for linear congruential generat... |

60 |
An Introduction to
- Larsen, Marx
- 1986
(Show Context)
Citation Context ...lained by Beker and Piper [8], Denning [20], Rivest [76], or Simmons [81], for example. We also assume the reader is familiar with elementary statistics---as explained by Hoel [48] or Larsen and Marx =-=[62]-=-. 2 We use the term k-gram to refer to any sequence of exactly k letters. For k = 1; 2; 3, we call any such gram a unigram, bigram, or trigram, respectively. Ganesan and Sherman, Statistical Technique... |

58 |
Testing for independence in a two-way table: new interpretations of the chi-square statistic. With discussions and with a reply by the authors
- Diaconis, Efron
- 1985
(Show Context)
Citation Context ...lts can be obtained by using a variation of X 2 , as derived by Lehmann [64], that takes advantage of the signs of these deviations. For independence problems related to Problem 3, Diaconis and Efron =-=[21]-=- propose a new interpretation of the X 2 statistic. And for Problem 4, Good [40], Good, Gover, and Mitchell [45], and Good and Crook [44] recommend using Cochran's continuity-adjusted X 0 2 statistic ... |

54 |
Pattern Analysis and Understanding
- Niemann
- 1990
(Show Context)
Citation Context ...recognition, one can draw upon the many statistical techniques developed in artificial intelligence for automatic pattern recognition. Mendal and Fu [67] overview such statistical techniques. Niemann =-=[69]-=- also offers an introduction to this area, with an emphasis on the analysis of patterns from visual images and sound. For surveys on automatic pattern recognition and statistical techniques in pattern... |

51 |
Cipher System: The Protection of Communication
- Beker, Piper
- 1982
(Show Context)
Citation Context ...ition, image analysis, submarine detection, and speaker identification. Therefore, we expect our work to 1 We assume the reader is familiar with basics of cryptology---as explained by Beker and Piper =-=[8]-=-, Denning [20], Rivest [76], or Simmons [81], for example. We also assume the reader is familiar with elementary statistics---as explained by Hoel [48] or Larsen and Marx [62]. 2 We use the term k-gra... |

50 |
Elements of Applied Stochastic Processes
- Bhat
- 1972
(Show Context)
Citation Context ...alphabet into equivalence classes. For example, one 11 Unless otherwise noted, we shall use the phrase Markov chain to mean a finite stationary Markov process. For a review of Markov chains, see Bhat =-=[9]-=-, Billingsly [12], or Kemeny [56]. 12 Throughout this paper, we denote the set of positive integers by the symbol Z + . 13 A Markov chain is ergodic if and only if it is stationary, recurrent, and ape... |

46 |
Statistical methods in Markov chains
- Billingsley
- 1961
(Show Context)
Citation Context ...by Hoel [47] and Good [30]. 20 For a concise summary of this review, see Bhat [9, Chapter 5]. For an extensive survey of the asymptotic theory of statistical methods on Markov chains, see Billingsley =-=[10, 11]-=-. Another similar and more general view is to cast the problem as an inference problem on contingency tables. A contingency table is simply a table of data (e.g. observed k-gram frequencies) that can ... |

46 |
Effects of Sample Size in Classifier Design
- Fukunaga, Hayes
- 1989
(Show Context)
Citation Context ...sent algorithms for classifying strings into known distributions. In addition, Lund and Lee [66] apply Wald's Sequential Probability Ratio Test (SPRT) to authenticate speakers, and Fukunaga and Hayes =-=[27]-=- study the effect of sample size on parameter estimates used in linear and quadratic classifiers. 8.3 Open Problems Our study of language recognition raises several important questions involving: theo... |

29 |
On the application of mixture AR hidden Markov models to text independent speaker recognition
- Tishby
- 1991
(Show Context)
Citation Context ...ply statistical pattern-recognition techniques. In speech-recognition tasks, many researchers have extensively used the hidden Markov model (HMM), which we discuss in Section 8.1. For example, Tishby =-=[87]-=- applies a HMM to identify speakers. Ljolje and Levinson [65] and Lee and Hon [63] also apply this model to speech-recognition problems. Using Bayesian techniques, Raviv [73] developed a program to re... |

28 |
On the application of symmetric Dirichlet distributions and their mixtures to contingency tables
- Good
- 1976
(Show Context)
Citation Context ...ian statistic G, based on his Bayes/Non-Bayes Compromise [43, 44]. Good claims this statistic is sometimes useful for Problem 4 for small samples when the X 2 statistic does not apply. Good and Crook =-=[18, 19, 41]-=- analyze the G statistic and compare it with the X 2 and log-likelihood statistics and with a Bayes factor F . In his study of pseudorandom number generators, Knuth [57] describes many tests for nonra... |

23 |
Pseudo-random generators and complexity classes
- Boppana, Hirschfeld
- 1989
(Show Context)
Citation Context ...for time- and space-bounded computations, and to identify such optimal tests for languagerecognition problems. For some relevant foundational work, see Blum and Goldreich [14], Boppana and Hirschfeld =-=[15]-=-, and Yao [95]. This important area offers a synergistic opportunity for cooperation among statisticians, complexity theorists, and cryptologists. 9 Conclusion In this introductory guide, we have show... |

20 |
Decision making in Markov chains applied to the problem of pattern recognition
- Raviv
- 1967
(Show Context)
Citation Context ... 8.1. For example, Tishby [87] applies a HMM to identify speakers. Ljolje and Levinson [65] and Lee and Hon [63] also apply this model to speech-recognition problems. Using Bayesian techniques, Raviv =-=[73]-=- developed a program to recognize printed characters in legal text, and Valiveti and Oommen [91] present algorithms for classifying strings into known distributions. In addition, Lund and Lee [66] app... |

18 |
Statistical Inference
- Rohatgi
- 1984
(Show Context)
Citation Context ...e term test statistic refers to the random variable /; the term test refers the entire testing process. 15 For a more detailed explanation of statistical hypothesis testing, see Lehmann [64], Rohatgi =-=[77]-=-, and Wilks [93]. Ganesan and Sherman, Statistical Techniques for Language Recognition---February 25, 1993 13 Simple versus Compound Hypotheses We assume the distribution of / belongs to some known cl... |

15 |
Positional frequency and versatility of bigrams for two- through nine-letter English words
- Solso, Juel
- 1980
(Show Context)
Citation Context ... language. Some practitioners may choose to estimate their own transition probabilities; others may prefer to use published language statistics, such as those reported by Solso, Juel, King, and Rubin =-=[83, 84, 85, 86]-=-. For additional sources of language statistics, see Shannon's study of the entropy and redundancy of English [80], Good's survey of the statistics of language [34, pp. 577-- 578], and cryptology text... |

13 | On the Cryptanalysis of Rotor Machines and SubstitutionPermutation Networks - Andleman, Reeds - 1982 |

12 |
A Bayesian significance test for multinomial distributions
- Good
- 1967
(Show Context)
Citation Context ...ting the estimated parameters, it is possible to create an equivalent effect by appropriately modifying the test statistics. For a discussion and comparison of various flattening techniques, see Good =-=[39, 40]-=-. Language Statistics for Natural Languages To carry out many tests described in this paper, it is necessary to know transition probabilities of the base language. Some practitioners may choose to est... |

12 |
The Bayes/Non-Bayes Compromise: a Brief Review
- Good
- 1992
(Show Context)
Citation Context ...s Other relevant statistics are also suggested in the statistics and computer science literature. For example, Good [40] introduces a non-Bayesian statistic G, based on his Bayes/Non-Bayes Compromise =-=[43, 44]-=-. Good claims this statistic is sometimes useful for Problem 4 for small samples when the X 2 statistic does not apply. Good and Crook [18, 19, 41] analyze the G statistic and compare it with the X 2 ... |

12 |
editors, Adaptive, Learning and Pattern Recognition Systems
- Mendel, Fu
- 1970
(Show Context)
Citation Context ... viewing language recognition as a form of pattern recognition, one can draw upon the many statistical techniques developed in artificial intelligence for automatic pattern recognition. Mendal and Fu =-=[67]-=- overview such statistical techniques. Niemann [69] also offers an introduction to this area, with an emphasis on the analysis of patterns from visual images and sound. For surveys on automatic patter... |

11 |
Elementary Cryptanalysis: A Mathematical Approach. Random House
- Sinkov
- 1968
(Show Context)
Citation Context ...variable is standard normal if and only if it has a Gaussian distribution with mean 0 and variance 1. Ganesan and Sherman, Statistical Techniques for Language Recognition---February 25, 1993 5 Sinkov =-=[82]-=- points out two applications of the IC. First, as a measure of roughness of the distribution of observed ciphertext characters, the IC can help identify the unknown encryption scheme. Second, it can b... |

10 |
The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain
- Dickey, Lientz
- 1970
(Show Context)
Citation Context ...that interprets the power spectrum of finite strings. To evaluate his statistic, Feldman applies it to strings produced by short-round versions of the DES cryptosystem. In addition, Dickey and Lientz =-=[22]-=- propose a Bayesian test for Markov order based on a weighted likelihood ratio. 7.5 Decision Procedures To interpret values of these test statistics, the cryptanalyst must adopt some decision procedur... |

10 |
Patterns in pattern recognition
- Kanal
- 1974
(Show Context)
Citation Context ...n to this area, with an emphasis on the analysis of patterns from visual images and sound. For surveys on automatic pattern recognition and statistical techniques in pattern classification, see Kanal =-=[53]-=- and Ho and Agrawala [46]. Other techniques, such as neural networks, rule-based systems, and fuzzy logic, have also been tried in pattern recognition, but we shall focus here on the application of st... |

10 |
Development of an acoustic-phonetic hidden markov model forcontinuous speech recognition
- Ljolje, Levinson
- 1991
(Show Context)
Citation Context ...ognition tasks, many researchers have extensively used the hidden Markov model (HMM), which we discuss in Section 8.1. For example, Tishby [87] applies a HMM to identify speakers. Ljolje and Levinson =-=[65]-=- and Lee and Hon [63] also apply this model to speech-recognition problems. Using Bayesian techniques, Raviv [73] developed a program to recognize printed characters in legal text, and Valiveti and Oo... |

10 |
Cryptography,” in Handbook of Theoretical Computer
- Rivest
- 1990
(Show Context)
Citation Context ...marine detection, and speaker identification. Therefore, we expect our work to 1 We assume the reader is familiar with basics of cryptology---as explained by Beker and Piper [8], Denning [20], Rivest =-=[76]-=-, or Simmons [81], for example. We also assume the reader is familiar with elementary statistics---as explained by Hoel [48] or Larsen and Marx [62]. 2 We use the term k-gram to refer to any sequence ... |

9 |
The frequency goodness of fit test for probability chains
- Bartlett
- 1951
(Show Context)
Citation Context ... the asymptotic behavior of the tests. For each test statistic based on the log-likelihood ratio, initial conditions can be simply incorporated by adding the appropriate summand, as shown by Bartlett =-=[7]-=- for example. Another approach for dealing with small samples is given by Yakowitz [94], who motivated by river modeling problems, proposes a new class of tests for the order of a Markov chain. Multip... |

9 | How to Break Gifford's Cipher
- Cain, Sherman
- 1994
(Show Context)
Citation Context ...of plaintext in transposition ciphers and to complete partial solutions of polyalphabetic substitution ciphers. Furthermore, in their new ciphertext-only attack on filter generators, Cain and Sherman =-=[16, 17]-=- apply a language-recognition subroutine to detect when they have discovered part of the initial fill. Related statistical techniques are also useful in breaking the Hagelin cryptograph [75] and vario... |

8 |
A New Spectral Test for Nonrandomness and the DES
- Feldman
- 1990
(Show Context)
Citation Context ...unts in a geometric fashion. Knuth [57, p. 89] asserts, that for linear congruential generators, his test "is by far the most powerful test known." Using ideas from digital signal processing=-=, Feldman [23]-=- proposes a new spectral test for nonrandomness that interprets the power spectrum of finite strings. To evaluate his statistic, Feldman applies it to strings produced by short-round versions of the D... |