Results 1 
3 of
3
Efficient Computation of Stochastic Complexity
 Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics
, 2003
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.
BAYDA: Software for Bayesian Classification and Feature Selection
 Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD98
, 1998
"... BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, whe ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independenceassumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The scheme is based on a novel Bayesian feature selection criterion introduced in this paper. The suggested criterion is inspired by the CheesemanStutz approximation for computing the marginal likelihood of Bayesiannetworks with hidden variables. The empirical results with several widelyused data sets demonstrate that the automated Bayesian...
Stochastic Complexity Based Estimation of Missing Elements in Questionnaire Data
 in Questionnaire Data”. the Annual American Educational Research Association Meeting, SIG Educational Statisticians
, 1998
"... this paper we study a new informationtheoretically justified approach to missing data estimation for multivariate categorical data. The approach discussed is a modelbased imputation procedure relative to a model class (i.e., a functional form for the probability distribution of the complete data m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper we study a new informationtheoretically justified approach to missing data estimation for multivariate categorical data. The approach discussed is a modelbased imputation procedure relative to a model class (i.e., a functional form for the probability distribution of the complete data matrix), which in our case is the set of multinomial models with some independence assumptions. Based on the given model class assumption an informationtheoretic criterion can be derived to select between the different complete data matrices. Intuitively this general criterion, called stochastic complexity, represents the shortest code length needed for coding the complete data matrix relative to the model class chosen. Using this informationtheoretic criteria, the missing data problem is reduced to a search problem, i.e., finding the data completion with minimal stochastic complexity. In the experimental part of the paper we present empirical results of the approach using two real data sets, and compare these results to those achived by commonly used techniques such as case deletion and imputating sample averages. Introduction