Results 1  10
of
15
Bayes not Bust! Why Simplicity is no Problem for Bayesians
, 2007
"... The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. This forms an important part of their wider attack on Bayesianism in the philosophy of science. We defend a Bayesian alternative: the simplicity of a theory is to be characterised in terms of Wallace’s Minimum Message Length (MML). We show that AIC is inadequate for many statistical problems where MML performs well. Whereas MML is always defined, AIC can be undefined. Whereas MML is not known ever to be statistically inconsistent, AIC can be. Even when defined and consistent, AIC performs worse than MML on small sample sizes. MML is statistically invariant under 1to1 reparametrisation, thus avoiding a common criticism of Bayesian approaches. We also show that MML provides answers to many of Forster’s objections to Bayesianism. Hence an important part of the attack on
Hedging predictions in machine learning
 Comput. J
, 2007
"... Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for ‘hedging ’ the predictions output by many such algorithms, including support vector machines, kernel ridge regressi ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for ‘hedging ’ the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other stateoftheart methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects ’ features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning. 1
MML Inference of Oblique Decision Trees
 In Lecture Notes in Artificial Intelligence (LNAI) 3339 (Springer), Proc. 17th Australian Joint Conf. on AI
, 2004
"... Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution st ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution strategy. We test our multivariate tree inference scheme on UCI machine learning repository data sets and compare with the decision tree programs C4.5 and C5. The preliminary results show that on average and on most datasets, MML oblique trees clearly perform better than both C4.5 and C5 on both “right”/“wrong ” accuracy and probabilistic prediction and with smaller trees, i.e., less leaf nodes. 1
Causal models as minimal descriptions of multivariate systems. http://parallel.vub.ac.be/∼jan
, 2006
"... ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing t ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing the meaningful information. The theory of causal modeling can be interpreted by this approach. The regularities are the conditional independencies reducing a factorization and the vstructure regularities. In the absence of other regularities, a causal model is faithful and offers a minimal description of a probability distribution. The causal interpretation of a faithful Bayesian network is motivated by the canonical representation it offers and faithfulness. A causal model decomposes the distribution into independent atomic blocks and is able to explain all qualitative properties found in the data. The existence of faithful models depends on the additional regularities in the data. Local structure of the conditional probability distributions allow further compression of the model. Interfering regularities, however, generate conditional independencies that do not follow from the Markov condition. These regularities has to be incorporated into an augmented model for which the inference algorithms are adapted to take into account their influences. But for other regularities, like patterns in a string, causality does not offer a modeling framework that leads to a minimal description. 1
On causally asymmetric versions of Occam’s Razor and their relation to thermodynamics
, 2007
"... and their relation to thermodynamics ..."
MML, HYBRID BAYESIAN NETWORK GRAPHICAL MODELS, STATISTICAL CONSISTENCY, INVARIANCE AND UNIQUENESS
"... The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and nonhuman) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations fr ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and nonhuman) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations from which one can make
Decision forests with oblique decision trees
, 2006
"... Ensemble learning schemes have shown impressive increases in prediction accuracy over single model schemes. We introduce a new decision forest learning scheme, whose base learners are Minimum Message Length (MML) oblique decision trees. Unlike other tree inference algorithms, MML oblique decision tr ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Ensemble learning schemes have shown impressive increases in prediction accuracy over single model schemes. We introduce a new decision forest learning scheme, whose base learners are Minimum Message Length (MML) oblique decision trees. Unlike other tree inference algorithms, MML oblique decision tree learning does not overgrow the inferred trees. The resultant trees thus tend to be shallow and do not require pruning. MML decision trees are known to be resistant to overfitting and excellent at probabilistic predictions. A novel weighted averaging scheme is also proposed which takes advantage of high probabilistic prediction accuracy produced by MML oblique decision trees. The experimental results show that the new weighted averaging offers solid improvement over other averaging schemes, such as majority vote. Our MML decision forests scheme also returns favourable results compared to other ensemble learning algorithms on data sets with binary classes.
A Preliminary MML Linear Classifier using Principal Components for Multiple Classes
"... In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (twoclass) problems only to multinomial (multiclass) problem ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (twoclass) problems only to multinomial (multiclass) problems. The application to which the MML criterion has been applied in this paper is the classification of objects via a linear hyperplane, where the objects are able to come from any multiclass distribution. The inclusion of Principal Component Analysis to the original inference scheme reduces the bias present in the classifier’s search technique. Such improvements lead to a method which, when compared against three commercial Support Vector Machine (SVM) classifiers on Binary data, was found to be as good as the most successful SVM tested. Furthermore, the new scheme is able to classify objects of a multiclass distribution with just one hyperplane, whereas SVMs require several hyperplanes.
Measuring Cognitive Abilities of Machines, Humans and NonHuman Animals in a Unified Way: towards Universal
, 2012
"... We present and develop the notion of ‘universal psychometrics ’ as a subject of study, and eventually a discipline, that focusses on the measurement of cognitive abilities for the machine kingdom, which comprises any kind of individual or collective, either artificial, biological or hybrid. Universa ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present and develop the notion of ‘universal psychometrics ’ as a subject of study, and eventually a discipline, that focusses on the measurement of cognitive abilities for the machine kingdom, which comprises any kind of individual or collective, either artificial, biological or hybrid. Universal psychometrics can be built, of course, upon the experience, techniques and methodologies from (human) psychometrics, comparative cognition and related areas. Conversely, the perspective and techniques which are being developed in the area of machine intelligence measurement using (algorithmic) information theory can be of much broader applicability and implication outside artificial intelligence. This general approach to universal psychometrics spurs the reunderstanding of most (if not all) of the big issues about the measurement of cognitive abilities, and creates a new foundation for (re)defining and mathematically formalising the concept of cognitive task, evaluable subject, interface, task choice, difficulty, agent response curves, etc. We introduce the notion of a universal cognitive test and discuss whether (and when) it may be necessary for exploring the machine kingdom. On the issue of intelligence and very general abilities, we also get some results and connections with the related notions of nofreelunch theorems and universal priors. 1
Inferring Phylogenetic Graphs for Natural Languages Using MML
 ARTIFICIAL INTELLIGENCE (CAEPIA
, 2005
"... Languages, like everything around us, evolve and change over a period of time. The aim of this report is to be able to model this evolution that occurs between natural languages. We introduce the idea of inferring phylogenetic (or evolutionary) models for natural languages using the Minimum Mess ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Languages, like everything around us, evolve and change over a period of time. The aim of this report is to be able to model this evolution that occurs between natural languages. We introduce the idea of inferring phylogenetic (or evolutionary) models for natural languages using the Minimum Message Length (MML) principle. Phylogenetic models show the evolutionary interrelationship among various species or other entities. We extend