Results 1  10
of
15
NML Computation Algorithms for TreeStructured Multinomial Bayesian Networks
, 2007
"... Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general fr ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, treestructured Bayesian networks.
Analyzing the Stochastic Complexity via Tree Polynomials
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
, 2008
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance.
Computing the Regret Table for Multinomial Data
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case
A Fast Normalized Maximum Likelihood Algorithm for Multinomial Data
 In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI05
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Furthermore, in order to apply NML in practice, one often needs to compute a whole table of these exponential sums. In our previous work, we were able to compute this table by a recursive algorithm. The purpose of this paper is to significantly improve the time complexity of this algorithm. The techniques used here are based on the discrete Fourier transform and the convolution theorem.
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance. 1
The Rate Problem: Orthodox, Bayesian and Information Theoretic Statistical Inference
"... The rate problem requires determining whether or not the underlying rate of some phenomenon is the same in two populations, based on finite samples from each population that count the number of `successes' from the total number of observations. We develop Bayesian and Information Theoretic statis ..."
Abstract
 Add to MetaCart
The rate problem requires determining whether or not the underlying rate of some phenomenon is the same in two populations, based on finite samples from each population that count the number of `successes' from the total number of observations. We develop Bayesian and Information Theoretic statistical criteria for making this decision, and compare their performance to the Orthodox significance testing approach at three commonlyused critical levels. Two MonteCarlo evaluations, corresponding to two di#erent realistic assumptions about the availability of data in rate problems, both show that the Bayesian and Information Theoretic criteria perform extremely similarly, and are more accurate than the Orthodox approach. The superiority of the Bayesian and Information Theoretic criteria is particularly evident when dealing with small sample sizes, corresponding to realworld situations where additional data are expensive, dangerous, or just impossible to obtain.
published as scientific articles elsewhere. Computing the
, 2005
"... Copyright c ○ 2005 held by the authors NB. The HIIT Technical Reports series is intended for rapid dissemination of results produced by the HIIT researchers. Therefore, some of the results may also be later ..."
Abstract
 Add to MetaCart
Copyright c ○ 2005 held by the authors NB. The HIIT Technical Reports series is intended for rapid dissemination of results produced by the HIIT researchers. Therefore, some of the results may also be later
Efficient Computation of NML . . .
"... Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant ..."
Abstract
 Add to MetaCart
Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an informationtheoretic solution to this problem. Unfortunately, computing it for arbitrary Bayesian network models appears to be computationally infeasible, but we show how it can be computed efficiently for certain restricted type of Bayesian network models.
1 An MDL Framework for Data Clustering
"... We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an informationtheoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we g ..."
Abstract
 Add to MetaCart
We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an informationtheoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we group together those data items that can be compressed well together, so that the total code length over all the data groups is optimized. One can argue that as efficient compression is possible only when one has discovered underlying regularities that are common to all the members of a group, this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood code which has been shown to produce optimal compression rate in an explicitly defined manner. The number of groups can be assumed to be unknown, and the problem of deciding the optimal number is formalized as part of the same theoretical framework. In the empirical part of the paper we present results that demonstrate the validity of the suggested clustering framework. 1.1