Results 1 -
9 of
9
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection o ..."
Abstract
-
Cited by 215 (42 self)
- Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Word-Sense Disambiguation Using Decomposable Models
- In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics
, 1994
"... Most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabili ..."
Abstract
-
Cited by 124 (17 self)
- Add to MetaCart
Most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguafion of the noun interest. We describe a method for formulating probabilistic models that use multiple contextual features for word-sense disambiguafion, without requiring untested assumptions regarding the form of the model. Using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data.
Improved learning of Bayesian networks
- Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the so-called “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and real-world data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
A New Approach to Word Sense Disambiguation
- In Proceedings of the ARPA Workshop on Human Language Technology
, 1994
"... This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, the results of which suggest that the schema is particularly well suited to nouns but that it is also applicable to words in other syntactic categories. 1. INTRODUCTION Assigning sense tags to the words in a text can be viewed as a classification problem. A probabilistic classifier assigns to each word the tag that has the highest estimated probability of having occurred in the given context. Designing a probabilistic classifier for word-sense disambiguation includes two main sub-tasks: specifying an appropriate model and estimating the parameters of that model. The former involves selecting informative contextual features (such as collocations) and describing the joint distribution of the...
Advances in Markov chain Monte Carlo methods
, 2007
"... Probability distributions over many variables occur frequently in Bayesian inference, statistical physics and simulation studies. Samples from distributions give insight into their typical behavior and can allow approximation of any quantity of interest, such as expectations or normalizing constants ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Probability distributions over many variables occur frequently in Bayesian inference, statistical physics and simulation studies. Samples from distributions give insight into their typical behavior and can allow approximation of any quantity of interest, such as expectations or normalizing constants. Markov Chain Monte Carlo (MCMC), introduced by Metropolis et al. (1953), allows sampling from distributions with intractable normalization, and remains one of most important tools for approximate computation with probability distributions.
While not needed by MCMC, normalizers are key quantities: in Bayesian statistics marginal likelihoods are needed for model comparison; in statistical physics many physical quantities relate to the partition function. In this thesis we propose and investigate several new Monte Carlo algorithms, both for evaluating normalizing constants and for improved sampling of distributions.
Many MCMC correctness proofs rely on using reversible transition operators; this can lead to chains exploring by slow random walks. After reviewing existing MCMC algorithms, we develop a new framework for constructing non-reversible transition operators from existing reversible ones.
Next we explore and extend MCMC-based algorithms for computing normalizing constants. In particular we develop a new MCMC operator and Nested Sampling approach for the Potts model. Our results demonstrate that these approaches can be superior to finding normalizing constants by annealing methods and can obtain better posterior samples.
Finally we consider "doubly-intractable" distributions with extra unknown normalizer terms that do not cancel in standard MCMC algorithms. We propose using several deterministic approximations for the unknown terms, and investigate their interaction with sampling algorithms. We then develop novel exact-sampling-based MCMC methods, the Exchange Algorithm and Latent Histories. For the first time these algorithms do not require separate approximation before sampling begins. Moreover, the Exchange Algorithm outperforms the only alternative sampling algorithm for doubly intractable distributions.
BIFROST - Block recursive models Induced From Relevant knowledge, Observations, and Statistical Techniques
- Computational Statistics and Data Analysis
, 1993
"... The theoretical background for a program for establishing expert systems on the basis of observations and expert knowledge is presented. Block recursive models form the basis of the statistical modelling. These models, together with various model selection methods for automatic model selection, a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The theoretical background for a program for establishing expert systems on the basis of observations and expert knowledge is presented. Block recursive models form the basis of the statistical modelling. These models, together with various model selection methods for automatic model selection, are presented. Additionally, the connection between a block recursive model and expert systems based on causal probabilistic networks is treated. A medical example concerning diagnosis of coronary artery disease forms the basis for an evaluation of the expert systems established. Keywords: causal probabilistic networks, graphical association models, machine learning, model selection, selection criteria, selection strategies. 1 Introduction BIFROST is a program for semi-automatic knowledge acquisition and is a continuation developments made in (Greve, Hjsgaard, Skjth and Thiesson 1990). The objective is to obtain preliminary causal models for use in the HUGIN expert system shell (Ander...
User's guide to BIFROST version 1.3
, 1998
"... Contents 2 Contents 1 Introduction 3 2 Block Recursive Models and BIFROST 4 3 Starting BIFROST 9 4 Specifications 10 5 The Model Selection Screen 19 6 Export to HUGIN 21 7 Example (Survival of Breast Cancer Patients) 22 8 Acknowledgments 26 9 Addendum to version 1.3: Case Selection 27 A The Datafi ..."
Abstract
- Add to MetaCart
Contents 2 Contents 1 Introduction 3 2 Block Recursive Models and BIFROST 4 3 Starting BIFROST 9 4 Specifications 10 5 The Model Selection Screen 19 6 Export to HUGIN 21 7 Example (Survival of Breast Cancer Patients) 22 8 Acknowledgments 26 9 Addendum to version 1.3: Case Selection 27 A The Datafile 31 B Installing BIFROST 1 Introduction 3 1 Introduction BIFROST is a program for semi-automatic knowledge acquisition. The objective is to obtain preliminary causal models for use in the HUGIN shell 1 . Based on a database of observations and minimal expert guidance the program will search for a model giving a description of the structure of association among the variables. The model obtained can be saved as, and afterwards loaded as a domain in the HUGIN shell. This domain forms the starting point for establishing a causal network. The program originates from the work done by the authors together with Jørgen Greve
MODEL SELECTION AND SIMPLIFICATION USING LATTICES
"... This paper shows how to cope with a problem of model selection and simplication using the principle of coherence (Gabriel (1969): Aprocedure involving testing a set of models ought not accept a model while rejecting a more general model). The mathematical lattice theory is used to de ne a partial or ..."
Abstract
- Add to MetaCart
This paper shows how to cope with a problem of model selection and simplication using the principle of coherence (Gabriel (1969): Aprocedure involving testing a set of models ought not accept a model while rejecting a more general model). The mathematical lattice theory is used to de ne a partial ordering over the space of considered models. Several examples of partial ordering in large families of models are given along with a searching algorithm to determine the best model with respect to chosen criteria.
Advances in Markov chain
, 2007
"... I, Iain Murray, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 3 Probability distributions over many variables occur frequently in Bayesian inference, statistical physics and simul ..."
Abstract
- Add to MetaCart
I, Iain Murray, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 3 Probability distributions over many variables occur frequently in Bayesian inference, statistical physics and simulation studies. Samples from distributions give insight into their typical behavior and can allow approximation of any quantity of interest, such as expectations or normalizing constants. Markov chain Monte Carlo (MCMC), introduced by Metropolis et al. (1953), allows sampling from distributions with intractable normalization, and remains one of most important tools for approximate computation with probability distributions. While not needed by MCMC, normalizers are key quantities: in Bayesian statistics marginal likelihoods are needed for model comparison; in statistical physics many physical quantities relate to the partition function. In this thesis we propose and investigate several new Monte Carlo algorithms, both for evaluating normalizing constants and for

