Results 11  20
of
49
The minimax strategy for gaussian density estimation
 In COLT
, 2000
"... We consider online density estimation with a Gaussian of unit variance. In each trial t the learner predicts a mean θt. Then it receives an instance xt chosen by the adversary and incurs loss 1 2 (θt − xt) 2. The performance of the learner is measured by the regret de£ned as the total loss of the l ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider online density estimation with a Gaussian of unit variance. In each trial t the learner predicts a mean θt. Then it receives an instance xt chosen by the adversary and incurs loss 1 2 (θt − xt) 2. The performance of the learner is measured by the regret de£ned as the total loss of the learner minus the total loss of the best mean parameter chosen offline. We assume that the horizon T of the protocol is £xed and known to both parties. We give the optimal strategies for both the learner and the adversary. The value of the game is 1 2X2 (ln T − ln ln T + O(ln ln T / ln T)), where X is an upper bound of the 2norm of instances. We also consider the standard algorithm that predicts with θt = ∑ t−1 q=1 xq/(t − 1 + a) for a £xed a. We show that the regret of this algorithm is 1 2 X2 (ln T − O(1)) regardless of the choice of a. This work was done while Eiji Takimoto was on a sabbatical
The LastStep Minimax Algorithm
 Pages 279 290 of: Proc. 11th International Conference on Algorithmic Learning Theory
, 2000
"... We consider online density estimation with a parameterized density from an exponential family. In each trial t the learner predicts a parameter t . Then it receives an instance x t chosen by the adversary and incurs loss ln p(x t j t ) which is the negative loglikelihood of x t w.r.t. the predict ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider online density estimation with a parameterized density from an exponential family. In each trial t the learner predicts a parameter t . Then it receives an instance x t chosen by the adversary and incurs loss ln p(x t j t ) which is the negative loglikelihood of x t w.r.t. the predicted density of the learner. The performance of the learner is measured by the regret dened as the total loss of the learner minus the total loss of the best parameter chosen oline. We develop an algorithm called the Laststep Minimax Algorithm that predicts with the minimax optimal parameter assuming that the current trial is the last one. For onedimensional exponential families, we give an explicit form of the prediction of the Laststep Minimax Algorithm and show that its regret is O(ln T ), where T is the number of trials. In particular, for Bernoulli density estimation the Laststep Minimax Algorithm is slightly better than the standard Laplace estimator. This work was done while...
Calculating the normalized maximum likelihood distribution for Bayesian forests
 in Proc. IADIS International Conference on Intelligent Systems and Agents
, 2007
"... When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution fo ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution for the model parameters. However, the problem of determining a reasonable prior for the parameters is a highly controversial issue, and no completely satisfying Bayesian solution has yet been presented in the noninformative setting. The normalized maximum likelihood (NML), based on Rissanen’s informationtheoretic MDL methodology, offers an alternative, theoretically solid criterion that is objective and noninformative, while no parameter prior is required. It has been previously shown that for discrete data, this criterion can be computed in linear time for Bayesian networks with no arcs, and in quadratic time for the so called Naive Bayes network structure. Here we extend the previous results by showing how to compute the NML criterion in polynomial time for treestructured Bayesian networks. The order of the polynomial depends on the number of values of the variables, but neither on the number of variables itself, nor on the sample size.
Combining Expert Advice Efficiently
"... We show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard HMM algorithms can then be used to efficiently calculate how the expert predictions should be weighted according to the model. We cast many existing models as HMMs an ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard HMM algorithms can then be used to efficiently calculate how the expert predictions should be weighted according to the model. We cast many existing models as HMMs and recover the best known running times in each case. We also describe two new models: the switch distribution, which was recently developed to improve Bayesian/Minimum Description Length model selection, and a new generalisation of the fixed share algorithm based on runlength coding. We give loss bounds for all models and shed new light on the relationships between them. 1
NML Computation Algorithms for TreeStructured Multinomial Bayesian Networks
, 2007
"... Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general fr ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, treestructured Bayesian networks.
Analyzing the Stochastic Complexity via Tree Polynomials
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure
Computing the Regret Table for Multinomial Data
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case
A Fast Normalized Maximum Likelihood Algorithm for Multinomial Data
 In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI05
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Furthermore, in order to apply NML in practice, one often needs to compute a whole table of these exponential sums. In our previous work, we were able to compute this table by a recursive algorithm. The purpose of this paper is to significantly improve the time complexity of this algorithm. The techniques used here are based on the discrete Fourier transform and the convolution theorem.
The Precise Minimax Redundancy
 IN PROCEEDINGS OF IEEE SYMPOSIUM ON INFORMATION THEORY
, 2002
"... We start with a quick introduction of the redundancy problem. A code C n : A ! f0; 1g is de ned as a mapping from the set A of all sequences x 1 = (x 1 ; : : : ; x n ) of length n over the nite alphabet A to the set f0; 1g of all binary sequences. Given a probabilistic source model, we le ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We start with a quick introduction of the redundancy problem. A code C n : A ! f0; 1g is de ned as a mapping from the set A of all sequences x 1 = (x 1 ; : : : ; x n ) of length n over the nite alphabet A to the set f0; 1g of all binary sequences. Given a probabilistic source model, we let 1 ) be the probability of the message x 1 ; given a code C n , we let L(C n ; x 1 ) be the code length for x 1 . From Shannon's works we know that the entropy H n (P ) = 1 ) lg P (x 1 ) is the absolute lower bound on the expected code length, where lg := log 2 denotes the binary logarithm. Hence lg P (x 1 ) can be viewed as the \ideal" code length. The next natural question is to ask by how much the length L(C n ; x 1 ) of a code diers from the ideal code length, either for individual sequences or on average. The pointwise redundancy R n (C n ; P ; x 1 ) = L(C n ; x while the average redundancy R n (C n ; P ) and the maximal redundancy R n (C n ;