Results 1  10
of
19
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 179 (11 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 169 (12 self)
 Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
A Bayesian Approach to Causal Discovery
, 1997
"... We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that t ..."
Abstract

Cited by 81 (1 self)
 Add to MetaCart
We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraintbased approach uses categorical information about conditionalindependence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraintbased counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structuresboth quantitative and qualitativecan be made. Three, information from several models can be combined to make better inferences and to better ...
A robust minimax approach to classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classcondi ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classconditional distributions. Misclassification probabilities are then controlled in a worstcase setting: that is, under all possible choices of classconditional densities with given mean and covariance matrix, we minimize the worstcase (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the
Bayesian Tests And Model Diagnostics In Conditionally Independent Hierarchical Models
 Journal of the American Statistical Association
, 1994
"... Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior distribution of all parameters of the CIHM can be efficiently simulated by Monte Carlo Markov Chain (MCMC) algorithms. Although these simulation algorithms have facilitated the application of CIHM's, they generally have not addressed the problem of computing quantities useful in model selection. This paper explores how MCMC simulation algorithms and other related computational algorithms can be used to compute Bayes factors that are useful in criticizing a particular CIHM. In the case where the CIHM models a belief that the parameters are exchangeable or lie on a regression surface, the Bayes factor can measure the consistency of the data with the structural prior belief. Bayes factors can ...
Asymptotics and the theory of inference
, 2003
"... Asymptotic analysis has always been very useful for deriving distributions in statistics in cases where the exact distribution is unavailable. More importantly, asymptotic analysis can also provide insight into the inference process itself, suggesting what information is available and how this infor ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Asymptotic analysis has always been very useful for deriving distributions in statistics in cases where the exact distribution is unavailable. More importantly, asymptotic analysis can also provide insight into the inference process itself, suggesting what information is available and how this information may be extracted. The development of likelihood inference over the past twentysome years provides an illustration of the interplay between techniques of approximation and statistical theory.
Hierarchical Models for Employment Decisions
, 1997
"... Federal law prohibits discrimination in employment decisions against persons in certain protected categories. The common method for measuring discrimination involves a comparison of some aggregate statistic for protected and nonprotected individuals. This approach is open to question when employmen ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Federal law prohibits discrimination in employment decisions against persons in certain protected categories. The common method for measuring discrimination involves a comparison of some aggregate statistic for protected and nonprotected individuals. This approach is open to question when employment decisions are made over an extended time period. We show how to use hierarchical proportional hazards models (Cox regression models) to analyze such data. When decisions are made at one time, the proportional hazards model reduces to the familiar doubly constrained hypergeometric model. Key words: Age Discrimination; Bayesian Analysis; Hierarchical Model; Proportional Hazards Model 1 Introduction Federal law forbids discrimination against employees or applicants because of an employees race, sex, religion, national origin, age 40 or older , or handicap. General discrimination law  say discrimination by race or sex  offers two somewhat distinct legal theories. A disparate treatment cas...
An Invariant Bayesian Model Selection Principle for Gaussian Data
, 2004
"... We develop a code length principle which is invariant to the choice of parameterization on the model distributions. An invariant approximation formula for easy computation of the marginal distribution is provided for gaussian likelihood models. We provide invariant estimators of the model parameters ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We develop a code length principle which is invariant to the choice of parameterization on the model distributions. An invariant approximation formula for easy computation of the marginal distribution is provided for gaussian likelihood models. We provide invariant estimators of the model parameters and formulate conditions under which these estimators are essentially posteriori unbiased for gaussian models. An upper bound on the coarseness of discretization on the model parameters is deduced. We introduce a discrimination measure between probability distributions and use it to construct probability distributions on model classes. The total code length is shown to be closely related to the NML code length of Rissanen when choosing Jeffreys prior distribution on the model parameters together with a uniform prior distribution on the model classes. Our model selection principle is applied to a gaussian estimation problem for data in a wavelet representation and its performance is tested and compared to alternative waveletbased estimation methods in numerical experiments.
THE 2000 WALD MEMORIAL LECTURES ASYMPTOTICS AND THE THEORY OF INFERENCE
"... Asymptotic analysis has always been very useful for deriving distributions in statistics in cases where the exact distribution is unavailable. More importantly, asymptotic analysis can also provide insight into the inference process itself, suggesting what information is available and how this infor ..."
Abstract
 Add to MetaCart
Asymptotic analysis has always been very useful for deriving distributions in statistics in cases where the exact distribution is unavailable. More importantly, asymptotic analysis can also provide insight into the inference process itself, suggesting what information is available and how this information may be extracted. The development of likelihood inference over the past twentysome years provides an illustration of the interplay between techniques of approximation and statistical theory. 1. Introduction. The