Results 1 
3 of
3
Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection
, 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Becaus ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
25th Annual Conference on Learning Theory Competitive Classification and Closeness Testing
"... We study the problems of classification and closeness testing. A classifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A closeness test determines whether two sequences were generated by the same or by different distributions. For b ..."
Abstract
 Add to MetaCart
We study the problems of classification and closeness testing. A classifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A closeness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are symmetric—they make the same decision under all symbol relabelings. With no assumptions on the distributions ’ support size or relative distance, we construct a classifier and closeness test that require at most Õ(n3/2) samples to attain the nsample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require ˜ Ω(n 7/6) samples to achieve the nsample accuracy of the best symmetric algorithm that knows the underlying distributions.
24th Annual Conference on Learning Theory Competitive Closeness Testing
"... We test whether two sequences are generated by the same distribution or by two different ones. Unlike previous work, we make no assumptions on the distributions ’ support size. Additionally, we compare our performance to that of the best possible test. We describe an efficientlycomputable algorithm ..."
Abstract
 Add to MetaCart
We test whether two sequences are generated by the same distribution or by two different ones. Unlike previous work, we make no assumptions on the distributions ’ support size. Additionally, we compare our performance to that of the best possible test. We describe an efficientlycomputable algorithm based on pattern maximum likelihood that is near optimal whenever the best possible error probability is ≤ exp(−14n 2/3) using lengthn sequences. 1.