## Efficient Computation of Stochastic Complexity (2003)

### Cached

### Download Links

Venue: | Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics |

Citations: | 15 - 11 self |

### BibTeX

@INPROCEEDINGS{Kontkanen03efficientcomputation,

author = {Petri Kontkanen and Wray Buntine and Petri Myllymäki and Jorma Rissanen and Henry Tirri},

title = {Efficient Computation of Stochastic Complexity},

booktitle = {Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics},

year = {2003},

pages = {233--238}

}

### Years of Citing Articles

### OpenURL

### Abstract

Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.

### Citations

2307 |
Estimating the dimension of a model
- SCHWARZ
- 1978
(Show Context)
Citation Context ...ral evolutionary steps during the last two decades. For example, the early realization of the MDL principle, the two-part code MDL (Rissanen, 1978), takes the same form as the Bayesian BIC criterion (=-=Schwarz, 1978-=-), which has led some people to incorrectly believe that MDL and BIC are equivalent. The latest instantiation of MDL discussed here is not directly related to BIC, but to a more evolved formalization ... |

1160 |
Modeling by shortest data description
- Rissanen
- 1983
(Show Context)
Citation Context ...tions of future events. The most well-founded theoretical formalization of the intuitively appealing minimum encoding approach is the Minimum Description Length (MDL) principle developed by Rissanen (=-=Rissanen, 1978-=-, 1987, 1996). The MDL principle has gone through several evolutionary steps during the last two decades. For example, the early realization of the MDL principle, the two-part code MDL (Rissanen, 1978... |

1039 | Bayesian Theory - Bernardo, Smith - 1994 |

498 | Stochastic Complexity - Rissanen - 1989 |

296 |
Stochastic Complexity in Statistical Inquiry
- Rissanen
- 1989
(Show Context)
Citation Context ...10) 2 where K is the number of values of the multinomial variable. As the name implies, the BIC has a Bayesian interpretation, but it can also be given a formulation in the MDL setting, as showed in (=-=Rissanen, 1989). In the mult-=-i-dimensional case, we easily get − log PBIC(x N |MT ) = − log P (x N | ˆ θ(x N )) + (K − 1) + K · �m i=1 (ni − 1) · log(N). (11) 2 As can be seen, the BIC approximation is very quick to... |

275 |
Fisher information and stochastic complexity
- Rissanen
- 1996
(Show Context)
Citation Context ...ed some people to incorrectly believe that MDL and BIC are equivalent. The latest instantiation of MDL discussed here is not directly related to BIC, but to a more evolved formalization described in (=-=Rissanen, 1996)-=-. For discussions on the theoretical advantages of this approach, see e.g. (Rissanen, 1996; Barron, Rissanen, & Yu, 1998; Grünwald, 1998; Rissanen, 1999; Xie & Barron, 2000; Rissanen, 2001) and the r... |

214 |
Average Case Analysis of Algorithms on Sequences
- Szpankowski
- 2001
(Show Context)
Citation Context ...del classes than our MT , the determinant of the Fisher information is no longer a product of Dirichlet integrals, which might cause technical problems. 3.3 SZPANKOWSKI APPROXIMATION Theorem 8.32 in (=-=Szpankowski, 2001-=-) gives the redundancy rate for memoryless sources. The theorem is based on analytic combinatorics and generating functions, and can be used as a basis for a new NML approximation. Redundancy rate for... |

196 | Asymptotic expansions of integrals - Bleistein, Handelsman - 1986 |

176 | Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables - Chickering, Heckerman - 1997 |

136 | Universal prediction - Merhav, Feder - 1998 |

125 |
Universal Sequential Coding of Single Messages
- Shtar‘kov
- 1987
(Show Context)
Citation Context ...ode MDL presented in (Rissanen, 1978) can be refined to a much more efficient coding scheme. This scheme is based on a notion of normalized maximum likelihood (NML), proposed for finite alphabets in (=-=Shtarkov, 1987). The-=- definition of NML is PNML(x N | M) = P (xN | ˆ θ(xN ), M) � yN P (yN | ˆ θ(yN , (1) ), M) where the sum goes over all the possible data matrices of length N. For discussions on the theoretical ... |

107 | Information-theoretic asymptotics of bayes methods - Clarke, Barron - 1990 |

72 | The minimum description length principle and reasoning under uncertainty
- Gru¨nwald
- 1998
(Show Context)
Citation Context ...ated to BIC, but to a more evolved formalization described in (Rissanen, 1996). For discussions on the theoretical advantages of this approach, see e.g. (Rissanen, 1996; Barron, Rissanen, & Yu, 1998; =-=Grünwald, 1998-=-; Rissanen, 1999; Xie & Barron, 2000; Rissanen, 2001) and the references therein. The most important notion of MDL is the Stochastic Complexity (SC), which is defined as the shortest description lengt... |

59 | Strong optimality of the normalized ML models as universal codes and information in data
- Rissanen
- 2001
(Show Context)
Citation Context ...cribed in (Rissanen, 1996). For discussions on the theoretical advantages of this approach, see e.g. (Rissanen, 1996; Barron, Rissanen, & Yu, 1998; Grünwald, 1998; Rissanen, 1999; Xie & Barron, 2000;=-= Rissanen, 2001-=-) and the references therein. The most important notion of MDL is the Stochastic Complexity (SC), which is defined as the shortest description length of a given data relative to a model class M. Unlik... |

57 | Hypothesis selection and testing by the MDL principle. The Computer Journal 42 - Rissanen - 1999 |

38 | On predictive distributions and Bayesian networks
- Kontkanen, Myllymäki, et al.
- 2000
(Show Context)
Citation Context ...log PRIS(x N |M1) = − log P (x N | ˆ θ(x N )) + K − 1 � � � N π log + log 2 2 π K/2 � � + o (1) , (13) Γ � K 2 where Γ(·) is the Euler gamma function. For the multi-dimensional =-=case, we have earlier (Kontkanen et al., 2000) derived the square r-=-oot of the determinant of the Fisher information for model class MT : � |I(θ)| = K� m� K� � θ α k=1 1 2(�m i=1 (ni−1)−1) k ni 1 − 2 ikv i=1 k=1 v=1 , (14) where αk = P (c = k) an... |

23 | Constructing Bayesian finite mixture models by the EM algorithm - Kontkanen, Myllymäki, et al. - 1996 |

19 | Minimum encoding approaches for predictive modeling
- Grünwald, Kontkanen, et al.
- 1998
(Show Context)
Citation Context ...y successful in practice in mixture modeling (Kontkanen, Myllymäki, & Tirri, 1996), cluster analysis, case-based reasoning (Kontkanen, Myllymäki, Silander, & Tirri, 1998), Naive Bayes classification=-= (Grünwald et al., 1998; K-=-ontkanen, Myllymäki, Silander, Tirri, & Grünwald, 2000) and data visualization (Kontkanen, Lahtinen, Myllymäki, Silander, & Tirri, 2000).sWe now show how to compute NML for MT . Assuming c has K va... |

18 | Supervised model-based visualization of high-dimensional data
- Kontkanen, Lahtinen, et al.
- 2000
(Show Context)
Citation Context ...log PRIS(x N |M1) = − log P (x N | ˆ θ(x N )) + K − 1 � � � N π log + log 2 2 π K/2 � � + o (1) , (13) Γ � K 2 where Γ(·) is the Euler gamma function. For the multi-dimensional =-=case, we have earlier (Kontkanen et al., 2000) derived the square r-=-oot of the determinant of the Fisher information for model class MT : � |I(θ)| = K� m� K� � θ α k=1 1 2(�m i=1 (ni−1)−1) k ni 1 − 2 ikv i=1 k=1 v=1 , (14) where αk = P (c = k) an... |

15 | The minimum description principle in coding and modeling - Barron, Rissanen, et al. - 1998 |

10 |
MDL estimation for small sample sizes and its application to linear regression
- Dom
- 1996
(Show Context)
Citation Context ... of certain length, which are obviously exponential in number. Some applications have been presented for discrete regression (Tabus, Rissanen, & Astola, 2002), linear regression (Barron et al., 1998; =-=Dom, 1996-=-), density estimation (Barron et al., 1998) and segmentation of binary strings (Dom, 1995). In this paper, we will present methods for removing the exponentiality of SC in several important cases invo... |

9 |
Non-Informative Priors do not Exist
- Bernardo
- 1997
(Show Context)
Citation Context ...dent on any prior distribution, it only uses the data at hand 1 . This means that the objectives of the MDL approach are very similar to those behind Bayesian methods with so-called reference priors (=-=Bernardo, 1997), b-=-ut note, however, that Bernardo himself expresses doubt that a reasonably general notion of “non-informative” pri1 Unlike Bayesian methods, with SC the possible subjective prior information is not... |

9 | Classification and feature gene selection using the normalized maximum likelihood model for discrete regression - Tabus, Rissanen, et al. |

8 |
MDL estimation with small sample sizes including an application to the problem of segmenting binary strings using Bernoulli models
- Dom
- 1997
(Show Context)
Citation Context ...n presented for discrete regression (Tabus, Rissanen, & Astola, 2002), linear regression (Barron et al., 1998; Dom, 1996), density estimation (Barron et al., 1998) and segmentation of binary strings (=-=Dom, 1995-=-). In this paper, we will present methods for removing the exponentiality of SC in several important cases involving multinomial (discrete) data. Even these methods are, however, in some cases computa... |

3 | On the accuracy of stochastic complexity approximations - Kontkanen, Myllymäki, et al. - 1997 |

2 | Bayes factors (Tech - Kass - 1994 |

1 | Asymptotic expansions oJ'integrals - Bleistein - 1975 |

1 | this article we have investigated how to compute the stochastic complexity both exactly and approximatively in an attempt to widen the application potential of the MDL principle. We showed that in the case of discrete data the exact form of SC can be comp - Chickering - 1997 |

1 |
MDL estimation Jbr small sample sizes and its application to linear regression (Tech. Rep
- Dom
- 1996
(Show Context)
Citation Context ... of certain length, which are obviously exponential in number. Some applications have been presented for discrete regression (Tabus, Rissanen, & Astola, 2002), linear regression (Barron et al., 1998; =-=Dom, 1996-=-), density estimation (Barron et al., 1998) and segmentation of binary strings (Dom, 1995). In this paper, we will present methods for removing the exponentiality of SC in several important cases invo... |

1 | The minimum description length principle and reasoning under uncertainO - No - 1998 |

1 | On the accuracy of stochastic complexity approximations - iki, Silander - 1999 |