Results 1 -
4 of
4
Correcting for Survey Misreports Using Auxiliary Information with an Application to Estimating Turnout
, 2009
"... Misreporting is a problem that plagues researchers that use survey data. In this paper, we develop a parametric model that corrects for misclassified binary responses using information on the misreporting patterns obtained from auxiliary data sources. The model is implemented within the Bayesian fra ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Misreporting is a problem that plagues researchers that use survey data. In this paper, we develop a parametric model that corrects for misclassified binary responses using information on the misreporting patterns obtained from auxiliary data sources. The model is implemented within the Bayesian framework via Markov Chain Monte Carlo (MCMC) methods, and can be easily extended to address other problems exhibited by survey data, such as missing response and/or covariate values. While the model is fully general, we illustrate its application in the context of estimating models of turnout using data from the American National Elections Studies.
ABSTRACT ERDEM, ISMAIL. Three Phase Sampling for Misclassified Binary Data.
"... Three measuring devices are available to classify units into one of the two mutually exclusive categories. The first two devices are relatively inexpensive procedures which tend to classify sampling units incorrectly; the third device is, in general, an expensive procedure which classifies the units ..."
Abstract
- Add to MetaCart
Three measuring devices are available to classify units into one of the two mutually exclusive categories. The first two devices are relatively inexpensive procedures which tend to classify sampling units incorrectly; the third device is, in general, an expensive procedure which classifies the units correctly. To estimate p, the proportion of units which belong to one of the two categories, a three phase sampling scheme is presented. At the first phase, a sample of n units is taken and fallible-2 classifications are obtained; at the second phase, a subsample of n l units is drawn from the first sample and fallible-l classifications are obtained; at the third phase, a subsample of n units is taken from Z the second sample and true classifications are obtained. Hypergeometric and multinomial variance and covariances are compared to justify that the observed frequencies to be denoted, n ijk, a jk, and x k can be assumed to be multinomially distributed. The maximum likelihood estimate of p and its asymptotic variance are derived. This variance is expressed in terms of the reliability coefficients of the fallible classifiers. The optimum values of n, nl ' and n which minimize the total 2 cost of selection and measurement for a fixed variance of estimation and which minimize the variance of the estimation for fixed budget are derived. This three phase sampling is compared both to a single phase sampling in which only true measurements are taken as well as to two phase sampling. Using a constructed finite population and various levels of reliabilities, we simulate the sampling and measurement operations to assess the correctness of our results. THREE PHASE SAMPLING FOR
Size for Case-Control Genetic Association Studies in the Presence of Phenotype and/or Genotype Misclassification Errors ∗
"... It is well established that phenotype and genotype misclassification errors reduce the power to detect genetic association. Resampling a subset of the data (e.g, double-sampling) of genotype and/or phenotype with a gold standard measurement is one method to address this issue. We derive the non-cent ..."
Abstract
- Add to MetaCart
It is well established that phenotype and genotype misclassification errors reduce the power to detect genetic association. Resampling a subset of the data (e.g, double-sampling) of genotype and/or phenotype with a gold standard measurement is one method to address this issue. We derive the non-centrality parameter (NCP) for the recently published Likelihood Ratio Test Allowing for Error (LRTae) in the presence of random phenotype and genotype errors. With the NCP, power and sample size can be analytically determined at any significance level. We verify analytic power with simulations using a 2**k factorial design given high and low settings of: case and control genotype frequencies, phenotype and genotype misclassification probabilities, total sample size, ratio of cases to controls, and proportions of phenotype and/or genotype double-samples. We also perform example applications of our method assuming equal costs for the LRTae method and the standard method that does not use double-sample information (LRTstd) to determine if power gain due to double-sampling a proportion of samples outweighs the reduction in sample size due to additional costs in obtaining double-samples. Our results showed a median difference of at most 0.01 between analytic and simulation power

