Results 11 - 20
of
130
Parameter Estimation in Bayesian Networks from Incomplete Databases
, 1998
"... Current methods to learn Bayesian Networks from incomplete databases share the common assumption that the unreported data are missing at random. This paper describes a method --- called Bound and Collapse (bc) --- to learn Bayesian Networks from incomplete databases which allows the analyst to effic ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Current methods to learn Bayesian Networks from incomplete databases share the common assumption that the unreported data are missing at random. This paper describes a method --- called Bound and Collapse (bc) --- to learn Bayesian Networks from incomplete databases which allows the analyst to efficiently integrate information provided by the observed data and exogenous knowledge about the pattern of missing data. bc starts by bounding the set of estimates consistent with the available information and then collapses the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Experiments comparing bc to Gibbs Samplings are provided. Keywords: Bayesian Inference
Not Asked Or Not Answered: Multiple Imputation for Multiple Surveys
- Journal of the American Statistical Association
, 1998
"... We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or differ ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or different sampling methods used, in different strata or clusters. Our method involves multiply-imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights were based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed to nonimputed data. We illustrate with the example that motivated this project --- a ...
Poststratification Into Many Categories Using Hierarchical Logistic Regression
, 1997
"... A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size ..."
Abstract
-
Cited by 15 (10 self)
- Add to MetaCart
A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size in the population. We consider poststratification as a general framework that includes many weighting schemes used in survey analysis (see Little, 1993). We construct a hierarchical logistic regression model for the mean of a binary response variable conditional on poststratification cells. The hierarchical model allows us to fit many more cells than is possible using classical methods, and thus to include much more population-level information, while at the same time including all the information used in standard survey sampling inferences. We are thus combining the modeling approach often used in small-area estimation with the population information used in poststratification. We apply the...
Learning Reliable Classifiers from Small or Incomplete Data Sets: the Naive Credal Classifier 2
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2008
"... In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in t ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, by issuing set-valued classifications, NCC2 is able to isolate and properly deal with instances that are hard to classify (on which naive Bayes accuracy drops considerably), and to perform as well as naive Bayes on the other instances. The experiments point to a general problem: they show that with missing values, empirical evaluations may not reliably estimate the accuracy of a traditional classifier, such as naive Bayes. This phenomenon adds even more value to the robust approach to classification implemented by NCC2.
On testing the missing at random assumption
- In Proceedings of the 17th European Conference on Machine Learning (ECML-2006
, 2006
"... Abstract. Most approaches to learning from incomplete data are based on the assumption that unobserved values are missing at random (mar). While the mar assumption, as such, is not testable, it can become testable in the context of other distributional assumptions, e.g. the naive Bayes assumption. I ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. Most approaches to learning from incomplete data are based on the assumption that unobserved values are missing at random (mar). While the mar assumption, as such, is not testable, it can become testable in the context of other distributional assumptions, e.g. the naive Bayes assumption. In this paper we investigate a method for testing the mar assumption in the presence of other distributional constraints. We present methods to (approximately) compute a test statistic consisting of the ratio of two profile likelihood functions. This requires the optimization of the likelihood under no assumptions on the missingness mechanism, for which we use our recently proposed AI & M algorithm. We present experimental results on synthetic data that show that our approximate test statistic is a good indicator for whether data is mar relative to the given distributional assumptions. 1
Collaborative filtering and the missing at random assumption. To be published
- in Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. 2007
, 2007
"... Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings. 1
A Bayesian formulation of exploratory data analysis and goodness-of-fit testing
, 2003
"... Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)|which are generally considered as unrelated statistical paradigms|can be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predict ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)|which are generally considered as unrelated statistical paradigms|can be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data y and replicated parameters follows a long tradition of generalizations in Bayesian theory.
Combining information from related regressions
- Journal of Agricultural, Biological, and Environmental Statistics
, 1997
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Comparing the SAS GLM and MIXED procedures for repeated measures
, 1998
"... Repeated measures analysesin the SAS GLM procedure involve the traditional univariate and multivariate approaches. The SAS MIXED procedure employs a more general covariance structure approach. This paper compares the two procedures and helps you understand their methodologies. A numerical example il ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Repeated measures analysesin the SAS GLM procedure involve the traditional univariate and multivariate approaches. The SAS MIXED procedure employs a more general covariance structure approach. This paper compares the two procedures and helps you understand their methodologies. A numerical example illustrates many of the key similarities and differences.

