Results 1 
5 of
5
Efficient Computation of Stochastic Complexity
 Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics
, 2003
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.
BAYDA: Software for Bayesian Classification and Feature Selection
 Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD98
, 1998
"... BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, whe ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independenceassumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The scheme is based on a novel Bayesian feature selection criterion introduced in this paper. The suggested criterion is inspired by the CheesemanStutz approximation for computing the marginal likelihood of Bayesiannetworks with hidden variables. The empirical results with several widelyused data sets demonstrate that the automated Bayesian...
Stochastic Complexity Based Estimation of Missing Elements in Questionnaire Data
 in Questionnaire Data”. the Annual American Educational Research Association Meeting, SIG Educational Statisticians
, 1998
"... this paper we study a new informationtheoretically justified approach to missing data estimation for multivariate categorical data. The approach discussed is a modelbased imputation procedure relative to a model class (i.e., a functional form for the probability distribution of the complete data m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper we study a new informationtheoretically justified approach to missing data estimation for multivariate categorical data. The approach discussed is a modelbased imputation procedure relative to a model class (i.e., a functional form for the probability distribution of the complete data matrix), which in our case is the set of multinomial models with some independence assumptions. Based on the given model class assumption an informationtheoretic criterion can be derived to select between the different complete data matrices. Intuitively this general criterion, called stochastic complexity, represents the shortest code length needed for coding the complete data matrix relative to the model class chosen. Using this informationtheoretic criteria, the missing data problem is reduced to a search problem, i.e., finding the data completion with minimal stochastic complexity. In the experimental part of the paper we present empirical results of the approach using two real data sets, and compare these results to those achived by commonly used techniques such as case deletion and imputating sample averages. Introduction
Classification of binary vectors by using DSC distance to
, 2002
"... minimize stochastic complexity ..."
BayesianGame Modeling of C2 Decision Making in Submarine BattleSpace Situation Awareness
, 2004
"... In a previous paper of ours [HPSZ02], we addressed the C2 decision support issues and introduced software agent architecture for combat C2 tactical decision aids under overwhelming information inflow and uncertainty. The research described in this paper is further concentrated on applying a Bayesian ..."
Abstract
 Add to MetaCart
In a previous paper of ours [HPSZ02], we addressed the C2 decision support issues and introduced software agent architecture for combat C2 tactical decision aids under overwhelming information inflow and uncertainty. The research described in this paper is further concentrated on applying a BayesianGametheoretic approach to multisource data fusion for achieving the situational awareness that supports C2 decision making in time and mission stressed settings with significant amount of information uncertainty and inaccuracy. The Consolidated Undersea Situational Awareness System (CUSAS) provides information management and integration by applying an evolutionary games theoretic model to state determinations and conflict resolutions in a mapping between the combat space data sets and the situational state estimations. A Bayesian probabilistic computation is conducted to evaluate sensory and environmental inputs and quantitatively rank the situational state hypotheses in terms of certainty functions. Asynchronous and intelligent agents are employed to support the prioritization, management, and coordination of the data fusion process, as well as to model adversarial and friendly behavior for providing advices to decision makers or other software agents playing human roles. The agents with data fusion ability are to learn and cooperate to process overwhelming combat information more accurately, systematically, and in a wellprioritized manner.