Results 11  20
of
127
Suboptimal behavior of Bayes and MDL in classification under misspecification
 COLT
, 2004
"... We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error. From a Bayesian point of view, the result can be reinterpreted as saying that Bayesian inference can be inconsistent under misspecification, even for countably infinite models. We extensively discuss the result from both a Bayesian and an MDL perspective.
Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family
 Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science
"... five action circling ..."
(Show Context)
Causal models as minimal descriptions of multivariate systems. http://parallel.vub.ac.be/∼jan
, 2006
"... ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing t ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing the meaningful information. The theory of causal modeling can be interpreted by this approach. The regularities are the conditional independencies reducing a factorization and the vstructure regularities. In the absence of other regularities, a causal model is faithful and offers a minimal description of a probability distribution. The causal interpretation of a faithful Bayesian network is motivated by the canonical representation it offers and faithfulness. A causal model decomposes the distribution into independent atomic blocks and is able to explain all qualitative properties found in the data. The existence of faithful models depends on the additional regularities in the data. Local structure of the conditional probability distributions allow further compression of the model. Interfering regularities, however, generate conditional independencies that do not follow from the Markov condition. These regularities has to be incorporated into an augmented model for which the inference algorithms are adapted to take into account their influences. But for other regularities, like patterns in a string, causality does not offer a modeling framework that leads to a minimal description. 1
Compression and intelligence: social environments and communication
"... Abstract. Compression has been advocated as one of the principles which pervades inductive inference and prediction and, from there, it has also been recurrent in definitions and tests of intelligence. However, this connection is less explicit in new approaches to intelligence. In this paper, we ad ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Compression has been advocated as one of the principles which pervades inductive inference and prediction and, from there, it has also been recurrent in definitions and tests of intelligence. However, this connection is less explicit in new approaches to intelligence. In this paper, we advocate that the notion of compression can appear again in definitions and tests of intelligence through the concepts of ‘mindreading’ and ‘communication ’ in the context of multiagent systems and social environments. Our main position is that twopart Minimum Message Length (MML) compression is not only more natural and effective for agents with limited resources, but it is also much more appropriate for agents in (cooperative) social environments than onepart compression schemes particularly those using a posteriorweighted mixture of all available models following Solomonoff’s theory of prediction. We think that the realisation of these differences is important to avoid a naive view of ‘intelligence as compression ’ in favour of a better understanding of how, why and where (onepart or twopart, lossless or lossy) compression is needed.
Discussion on Kolmogorov Complexity and Statistical Analysis
, 1999
"... equality (1) could be explained as follows: any object x # A has a twopart description. The first part is (a description of a) program p. The second part is the number of x in the enumeration of A (the element that appears first has number 1, the next element has number 2, etc.). The first part r ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
equality (1) could be explained as follows: any object x # A has a twopart description. The first part is (a description of a) program p. The second part is the number of x in the enumeration of A (the element that appears first has number 1, the next element has number 2, etc.). The first part requires K ( p) bits. The second part requires at most log 2  A bits. (Additional O(log n) bits are needed to form a pair; we omit the details.) We are interested in `efficient' twopart descriptions for which the inequality (1) is close to equality. For any string x there are many efficient descriptions. Here are two `extreme' examples: (a) The set A consists of x only: A ={x}; the program p that enumerates
Minimum Message Length Autoregressive Model Order Selection
 International Conference on Intelligent Sensing and Information Processing (ICISIP
, 2004
"... We derive a Minimum Message Length (MML) estimator for stationary and nonstationary autoregressive models using the Wallace and Freeman (1987) approximation. The MML estimator’s model selection performance is empirically compared with AIC, AICc, BIC and HQ in a Monte Carlo experiment by uniformly sa ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
(Show Context)
We derive a Minimum Message Length (MML) estimator for stationary and nonstationary autoregressive models using the Wallace and Freeman (1987) approximation. The MML estimator’s model selection performance is empirically compared with AIC, AICc, BIC and HQ in a Monte Carlo experiment by uniformly sampling from the autoregressive stationarity region. Generally applicable, uniform priors are used on the coefficients, model order and log σ 2 for the MML estimator. The experimental results show the MML estimator to have the best overall average mean squared prediction error and best ability to choose the true model order.
Advances on BYY Harmony Learning: Information Theoretic Perspective, Generalized Projection Geometry, and Independent Factor Autodetermination
, 2004
"... The nature of Bayesian YingYang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum descripti ..."
Abstract

Cited by 11 (9 self)
 Add to MetaCart
The nature of Bayesian YingYang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum description length (MDL), Bayesian approach, the bitback based MDL, Akaike information criterion (AIC), maximum likelihood, information geometry, Helmholtz machines, and variational approximation. Moreover, a generalized projection geometry is introduced for further understanding such a new mechanism. Furthermore, new algorithms are also developed for implementing Gaussian factor analysis (FA) and nonGaussian factor analysis (NFA) such that selecting appropriate factors is automatically made during parameter learning.
Univariate Polynomial Inference by Monte Carlo Message Length Approximation
 in Int. Conf. Machine Learning
, 2002
"... We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample. ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample.
MML Inference of Oblique Decision Trees
 In Lecture Notes in Artificial Intelligence (LNAI) 3339 (Springer), Proc. 17th Australian Joint Conf. on AI
, 2004
"... Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution st ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution strategy. We test our multivariate tree inference scheme on UCI machine learning repository data sets and compare with the decision tree programs C4.5 and C5. The preliminary results show that on average and on most datasets, MML oblique trees clearly perform better than both C4.5 and C5 on both “right”/“wrong ” accuracy and probabilistic prediction and with smaller trees, i.e., less leaf nodes. 1