Results 11  20
of
21
Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection
, 2000
"... Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have pr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have previously gone unnoticed and clarifying misconceptions which seem to have propagated in the applied literature. Our tour begins with Schwarz's approximation of Bayesian integrals via Laplace's method. We then introduce the concepts underlying Rissanen 's minimum description length principle via a Bayesian scenario with a known prior; this provides the groundwork for understanding his more complex nonBayesian MDL which employs a "universal" encoding of the integers. Rissanen's method of parameter truncation is contrasted with that employed in various versions of Wallace's minimum message length criteria.
Learning Dynamic Bayesian Network Models Via CrossValidation
"... We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an exp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an experimental comparison of crossvalidation and the Bayesian scoring criterion, as implemented by the Bayesian Dirichlet metric and the Bayesian information criterion. The results show that crossvalidation leads to models that generalize better for a wide range of sample sizes.
Edward Snelson snelson@gatsby.ucl.ac.uk
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We provide a general framework for learning precise, compact, and fast representations of the Bayesian predictive distribution for a model. This framework is based on minimizing the KL divergence between the true predictive density and a suitable compact approximation. We consider various meth ..."
Abstract
 Add to MetaCart
We provide a general framework for learning precise, compact, and fast representations of the Bayesian predictive distribution for a model. This framework is based on minimizing the KL divergence between the true predictive density and a suitable compact approximation. We consider various methods for doing this, both sampling based approximations, and deterministic approximations such as expectation propagation. These methods are tested on a mixture of Gaussians model for density estimation and on binary linear classification, with both synthetic data sets for visualization and several real data sets. Our results show significant reductions in prediction time and memory footprint.
for Learning Structure of Bayesian Networks as Classifier in Data Mining
"... Abstract. There are two categories of wellknown approach (as basic principle of classification process) for learning structure of Bayesian Network (BN) in data mining (DM): scoringbased and constraintbased algorithms. Inspired by those approaches, we present a new CB * algorithm that is developed ..."
Abstract
 Add to MetaCart
Abstract. There are two categories of wellknown approach (as basic principle of classification process) for learning structure of Bayesian Network (BN) in data mining (DM): scoringbased and constraintbased algorithms. Inspired by those approaches, we present a new CB * algorithm that is developed by considering four related algorithms: K2, PC, CB, and BC. The improvement obtained by our algorithm is derived from the strength of its primitives in the process of learning structure of BN. Specifically, CB * algorithm is appropriate for incomplete databases (having missing value), and without any prior information about node ordering.
Probabilistic Classification of Image Regionsusing an ObservationConstrained Generative Approach
 PROC. INT. WORKSHOP ON GENERATIVEMODEL BASED VISION
, 2002
"... In generic image understanding applications, one of the goals is to interpret the semantic context of the scene (e.g., beach, office etc.). In this paper, we propose a probabilistic region classification scheme for natural scene images as a priming step for the problem of context interpretation. In ..."
Abstract
 Add to MetaCart
In generic image understanding applications, one of the goals is to interpret the semantic context of the scene (e.g., beach, office etc.). In this paper, we propose a probabilistic region classification scheme for natural scene images as a priming step for the problem of context interpretation. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if a set of newly observed data has been generated because of the subset of the model support, using the full model to assign generative probabilities can produce serious artifacts in the probability assignments. This problem arises mainly when the different classes have multimodal distributions with considerable overlap in the feature space. We propose an approach to constrain the class generative probability of a set of newly observed data by exploiting the distribution of the new data itself and using linear weighted mixing. A KLDivergencebased fast model selection procedure is also proposed for learning mixture models in a sparse feature space. The preliminary results on the natural scene images support the effectiveness of the proposed approach.
THEORETICAL ADVANCES The aspect Bernoulli model: multiple causes of presences and absences
, 2007
"... Abstract We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’ ’ and ‘‘false absences’ ’ (both of which are coded as 0 in the dat ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’ ’ and ‘‘false absences’ ’ (both of which are coded as 0 in the data), and similarly, between ‘‘true presences’ ’ and ‘‘false presences’ ’ (both of which are coded as 1). This is accomplished by specific additive noise components which explicitly account for such noncontent bearing causes. The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important application of AB that we demonstrate is datadriven reasoning about palaeontological recordings. Additionally, results on recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other methods are demonstrated and discussed.
The aspect Bernoulli model: multiple causes of presences and absences
 PATTERN ANAL APPLIC
, 2007
"... We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’’ and ‘‘false absences’’ (both of which are coded as 0 in the data), and simi ..."
Abstract
 Add to MetaCart
(Show Context)
We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’’ and ‘‘false absences’’ (both of which are coded as 0 in the data), and similarly, between ‘‘true presences’’ and ‘‘false presences’’ (both of which are coded as 1). This is accomplished by specific additive noise components which explicitly account for such noncontent bearing causes. The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important application of AB that we demonstrate is datadriven reasoning about palaeontological recordings. Additionally, results on recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other methods are demonstrated and discussed.
c ○ 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. MetricBased Methods for Adaptive Model Selection
"... Abstract. We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against overfitting (while still avoiding underfitting). We then extend the approach to derive a general training criterion for supervised learning—yielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specific set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that sufficient unlabeled training data be available.
Abstract Learning dynamic Bayesian network models via crossvalidation
, 2005
"... We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an exp ..."
Abstract
 Add to MetaCart
(Show Context)
We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an experimental comparison of crossvalidation and the Bayesian scoring criterion, as implemented by the Bayesian Dirichlet metric and the Bayesian information criterion. The results show that crossvalidation leads to models that generalize better for a wide range of sample sizes.
Pattern Anal Applic (2009) 12:55–78 DOI 10.1007/s1004400700964 THEORETICAL ADVANCES The aspect Bernoulli model: multiple causes of presences
"... Abstract We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’’ and ‘‘false absences’ ’ (both of which are coded as 0 in the data) ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between ‘‘true absences’’ and ‘‘false absences’ ’ (both of which are coded as 0 in the data), and similarly, between ‘‘true presences’ ’ and ‘‘false presences’ ’ (both of which are coded as 1). This is accomplished by specific additive noise components which explicitly account for such noncontent bearing causes. The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important application of AB that we demonstrate is datadriven reasoning about palaeontological recordings. Additionally, results on recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other methods are demonstrated and discussed. A part of the work of Ella Bingham was performed while visiting the