Results 11 - 20
of
21
Strong consistency of the Good-Turing estimator
- in IEEE Int. Symp. Inf. Theor. Proc
, 2006
"... Abstract — We consider the problem of estimating the total probability of all symbols that appear with a given frequency in a string of i.i.d. random variables with unknown distribution. We focus on the regime in which the block length is large yet no symbol appears frequently in the string. This is ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We consider the problem of estimating the total probability of all symbols that appear with a given frequency in a string of i.i.d. random variables with unknown distribution. We focus on the regime in which the block length is large yet no symbol appears frequently in the string. This is accomplished by allowing the distribution to change with the block length. Under a natural convergence assumption on the sequence of underlying distributions, we show that the total probabilities converge to a deterministic limit, which we characterize. We then show that the Good-Turing total probability estimator is strongly consistent. I.
Universal compression of Markov and related sources over arbitrary alphabets
- IEEE Transactions on Information Theory
, 2006
"... Abstract — Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing per-symbol redundancy. In this paper the pattern redundanc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing per-symbol redundancy. In this paper the pattern redundancy of distributions with memory is considered. Close lower and upper bounds are established on the pattern redundancy of strings generated by Hidden Markov Models with a small number of states, showing in particular that their per-symbol pattern redundancy diminishes with increasing string length. The upper bounds are obtained by analyzing the growth rate of the number of multi-dimensional integer partitions, and the lower bounds, using Hayman’s Theorem. Index Terms — Hidden Markov Models, integer partitions, large alphabets, multi-dimensional partitions, patterns,
Book Review The Essential Turing Reviewed by Andrew Hodges The Essential Turing
"... The Essential Turing is a selection of writings of the ..."
ON THE RELATION BETWEEN ADDITIVE SMOOTHING AND UNIVERSAL CODING
"... We analyze the performance of smoothing methods for language modeling from the perspective of universal compression. We use existing asymptotic bounds on the performance of simple additive rules for compression of finite-alphabet memoryless sources to explain the empirical predictive abilities of ad ..."
Abstract
- Add to MetaCart
We analyze the performance of smoothing methods for language modeling from the perspective of universal compression. We use existing asymptotic bounds on the performance of simple additive rules for compression of finite-alphabet memoryless sources to explain the empirical predictive abilities of additive smoothing techniques. We further suggest a smoothing method that overcomes some of the problems observed in previous approaches. The new method outperforms existing ones on the Wall Street Journal(WSJ) database for bigram and trigram models. We then suggest possible directions for future research. 1.
Summary
"... Estimating bacterial diversity from clone libraries with flat rank abundance distributions ..."
Abstract
- Add to MetaCart
Estimating bacterial diversity from clone libraries with flat rank abundance distributions
A Universal Compression Perspective of Smoothing
"... We analyze smoothing algorithms from a universal-compression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We sho ..."
Abstract
- Add to MetaCart
We analyze smoothing algorithms from a universal-compression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We show that universal compression bounds can explain the empirical performance of several smoothing methods. We also describe a new interpolated additive smoothing algorithm, and show that it has lower training complexity and better compression performance than existing smoothing techniques. Key words: Language modeling, universal compression, smoothing 1
NetQTM: Node Configuration In Network Setup By Quantum Turing Machine
"... Abstract- The quantum Turing machine (QTM) has been introduced by Deutsch as an abstract model of quantum computation. In this paper we try to introduction the new transition function of a QTM can be used for any node configuration in the network. In this paper we introduce the fundamentals of NetQT ..."
Abstract
- Add to MetaCart
Abstract- The quantum Turing machine (QTM) has been introduced by Deutsch as an abstract model of quantum computation. In this paper we try to introduction the new transition function of a QTM can be used for any node configuration in the network. In this paper we introduce the fundamentals of NetQTM like a well-observed lemma and a machine allowing classical and quantum computations is motivated by the emergence of models of quantum computation like the one-way model. Furthermore, this model allows a formal and rigorous treatment of problems requiring classical interactions, like the halting[8] of QTM. Finally, it opens new perspectives for the construction of a universal QTM.
Error Bounds and Improved Probability Estimation using the Maximum Likelihood Set
, 2007
"... Abstract — The maximum likelihood set (MLS) is a novel candidate for nonparametric probability estimation from small samples that permits incorporating prior or structural knowledge into the estimator [1]. It is a set of probability distributions which assign to the observed type (or empirical distr ..."
Abstract
- Add to MetaCart
Abstract — The maximum likelihood set (MLS) is a novel candidate for nonparametric probability estimation from small samples that permits incorporating prior or structural knowledge into the estimator [1]. It is a set of probability distributions which assign to the observed type (or empirical distribution) a likelihood that is no lower than the likelihood they assign to any other type. The MLS has been shown to have many highly desirable properties, including strong consistency of MLS-based estimates; yet the probability that the MLS contains the data-generating distribution may be arbitrarily small! In this paper, we propose to overcome this shortcoming via an ε-fattening of the MLS. The proposed set, called the High Likelihood Set (HLS), with ε → 0 slowly in sample size, ensures that the HLS contains the datagenerating distribution with arbitrarily large probability, while retaining most desirable properties of the MLS. In particular, the
Entropy Inference and the James-Stein
, 2008
"... Entropy is a fundamental quantity in statistics and machine learning. In this note, we present a novel procedure for statistical learning of entropy from high-dimensional small-sample data. Specifically, we introduce a a simple yet very powerful small-sample estimator of the Shannon entropy based on ..."
Abstract
- Add to MetaCart
Entropy is a fundamental quantity in statistics and machine learning. In this note, we present a novel procedure for statistical learning of entropy from high-dimensional small-sample data. Specifically, we introduce a a simple yet very powerful small-sample estimator of the Shannon entropy based on James-Stein-type shrinkage. This results in an estimator that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms (in part substantially) eight other competing entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, including in cases of severe undersampling. A computer program is available that implements the proposed estimator.
and
, 705
"... Many applications use sequences of n consecutive symbols (n-grams). We review n-gram hashing and prove that recursive hash families are pairwise independent at best. We prove that hashing by irreducible polynomials is pairwise independent whereas hashing by cyclic polynomials is quasi-pairwise indep ..."
Abstract
- Add to MetaCart
Many applications use sequences of n consecutive symbols (n-grams). We review n-gram hashing and prove that recursive hash families are pairwise independent at best. We prove that hashing by irreducible polynomials is pairwise independent whereas hashing by cyclic polynomials is quasi-pairwise independent: we make it pairwise independent by discarding n − 1 bits. One application of hashing is to estimate the number of distinct n-grams, a view-size estimation problem. While view sizes can be estimated by sampling under statistical assumptions, we desire a statistically unassuming algorithm with universally valid accuracy bounds. Most related work has focused on repeatedly hashing the data, which is prohibitive for large data sources. We prove that a one-pass onehash algorithm is sufficient for accurate estimates if the hashing is sufficiently independent. For example, we can improve by a factor of 2 the theoretical bounds on estimation accuracy by replacing pairwise independent hashing by 4-wise independent hashing. We show that recursive random hashing is sufficiently independent in practice. Maybe surprisingly, our experiments showed that hashing by cyclic polynomials, which is only quasi-pairwise independent, sometimes outperformed 10-wise independent hashing while being twice as fast. For comparison, we measured the time to obtain exact n-gram counts using suffix arrays and show that, while we used hardly any storage, we were an order of magnitude faster. The experiments used a large collection of English text from Project Gutenberg as well as synthetic data.

