Results 1  10
of
10
Assessor error in stratified evaluation
 In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM
, 2010
"... Several important information retrieval tasks, including those in medicine, law, and patent review, have an authoritative standard of relevance, and are concerned about retrieval completeness. During theevaluationofretrievaleffectivenessinthesedomains,assessors make errors in applying the standard o ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Several important information retrieval tasks, including those in medicine, law, and patent review, have an authoritative standard of relevance, and are concerned about retrieval completeness. During theevaluationofretrievaleffectivenessinthesedomains,assessors make errors in applying the standard of relevance, and the impact oftheseerrors,particularlyonestimatesofrecall,isofcrucialconcern. UsingdatafromtheinteractivetaskoftheTRECLegalTrack, thispaperinvestigateshowreliablytheyieldofrelevantdocuments can be estimated from sampled assessments in the presence of assessor error, particularly where sampling is stratified based upon the results of participating retrieval systems. We show that assessorerrorisin general a greater source of inaccuracy thansampling error. A process of appeal and adjudication, such as used in the interactive task, is found to be effective at locating many assessment errors; but the process is expensive if complete, and biased if incomplete. An unbiased doublesampling method for resolving assessment error is proposed, and shown on representative data to bemore efficient and accurate than appealbased adjudication.
Bayesian Analysis of Binary Data Subject to Misclassification
 In Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner
, 1996
"... This paper considers estimation of success probabilities of categorical binary data subject to misclassification errors from the Bayesian point of view. It has been shown by Bross (1954) that sample proportions are in general biased estimates. This bias is a function of the amount of misclassificati ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper considers estimation of success probabilities of categorical binary data subject to misclassification errors from the Bayesian point of view. It has been shown by Bross (1954) that sample proportions are in general biased estimates. This bias is a function of the amount of misclassification and can be substantial. Tenenbein (1970) proposed to eliminate the bias by subjecting a portion of the sample to both true and fallible classifiers, resulting in a 2 x 2 table, from which the misclassification rates can be estimated. The rationale is that fallible classifiers are inexpensive relative to infallible ones. Hence if only a part of the sample is measured by the infallible classifier one can obtain a more efficient estimate, for a given sampling budget, than by measuring the whole sample using the infallible classifier. In many contexts an infallible classifier is unavailable or prohibitively expensive. Bayesian methods then provide a useful approach for dealing with the conseq...
June 1975Estimating a Bernu11i Parameter from a Sample of Misc1assified Responses and a SubSample of Randomized Responses
"... It appears that in the various publications on the use of the Rand~mized Response technique it has always been assumed that the experimenter has available to him only the sample of Randomized Responses to draw inferences from. However, in many applications, the Randomized Response technique is used ..."
Abstract
 Add to MetaCart
It appears that in the various publications on the use of the Rand~mized Response technique it has always been assumed that the experimenter has available to him only the sample of Randomized Responses to draw inferences from. However, in many applications, the Randomized Response technique is used when an original, usually large, sample is available. The original sample is based on misc1assified responses due to some stigma in the issues under study. In this note we assume that a subsample of individuals from the original sample (with the individual misc1assified responses available) is taken for application of the Randomized Response technique. Based on the simultaneous classification of the subsampled individuals according to their misc1assified and randomized responses and the original total sample of misc1assified responses, efficient methods for estimating the Bernu11i parameter of a stigmatizing response are discussed
Institute of Statistics Mimeo Series No. 1066 April 1976On Inference From General Categorical Data With Misc1assification Errors Based on Double Sampling Schemes
"... In order to resolve the difficulties involved in inference from a sample of categorical data obtained by using a fallible classifying mechanism (usually inexpensive), we consider theutilization of a subsample subjected to a simultaneous crossclassification of its elements by the fallible mechanism ..."
Abstract
 Add to MetaCart
In order to resolve the difficulties involved in inference from a sample of categorical data obtained by using a fallible classifying mechanism (usually inexpensive), we consider theutilization of a subsample subjected to a simultaneous crossclassification of its elements by the fallible mechanism and by some true (usually expensive) classifying mechanism. The setup is general; i.e., the discussion can be applied to any multidimensional crossclassified data obtained by unrestricted random sampling. Two methodologies are presented: (i) Maximum likelihood approach, (ii) Least squares approach. Both methodologies are illustrated using real data. Introduction. Much has been written on the effects of misclassification errors on studies of association in 2 x 2 contingency tables..We refer the reader to Fleiss (1973, Ch. 11) for a review of that subject.
Size for CaseControl Genetic Association Studies in the Presence of Phenotype and/or Genotype Misclassification Errors ∗
"... It is well established that phenotype and genotype misclassification errors reduce the power to detect genetic association. Resampling a subset of the data (e.g, doublesampling) of genotype and/or phenotype with a gold standard measurement is one method to address this issue. We derive the noncent ..."
Abstract
 Add to MetaCart
It is well established that phenotype and genotype misclassification errors reduce the power to detect genetic association. Resampling a subset of the data (e.g, doublesampling) of genotype and/or phenotype with a gold standard measurement is one method to address this issue. We derive the noncentrality parameter (NCP) for the recently published Likelihood Ratio Test Allowing for Error (LRTae) in the presence of random phenotype and genotype errors. With the NCP, power and sample size can be analytically determined at any significance level. We verify analytic power with simulations using a 2**k factorial design given high and low settings of: case and control genotype frequencies, phenotype and genotype misclassification probabilities, total sample size, ratio of cases to controls, and proportions of phenotype and/or genotype doublesamples. We also perform example applications of our method assuming equal costs for the LRTae method and the standard method that does not use doublesample information (LRTstd) to determine if power gain due to doublesampling a proportion of samples outweighs the reduction in sample size due to additional costs in obtaining doublesamples. Our results showed a median difference of at most 0.01 between analytic and simulation power
The Canadian Journal of Statistics 1
"... La revue canadiennede statistique Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model Jinbo CHEN and Norman E. BRESLOW Key words and phrases: Auxiliary outcome; conditional mean model; Horvitz–Thompson estimator; missing at random; semiparametric eff ..."
Abstract
 Add to MetaCart
La revue canadiennede statistique Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model Jinbo CHEN and Norman E. BRESLOW Key words and phrases: Auxiliary outcome; conditional mean model; Horvitz–Thompson estimator; missing at random; semiparametric efficient estimation. MSC 2000: Primary: 62D05, 62J12; secondary: 62H12. Abstract: The authors consider semiparametric efficient estimation of parameters in the conditional mean model for a simple incomplete data structure in which the outcome of interest is observed only for a random subset of subjects but covariates and surrogate (auxiliary) outcomes are observed for all. They use optimal estimating function theory to derive the semiparametric efficient score in closed form. They show that when covariates and auxiliary outcomes are discrete, a Horvitz–Thompson type estimator with empirically estimated weights is semiparametric efficient. The authors give simulation studies validating the finitesample behaviour of the semiparametric efficient estimator and its asymptotic variance; they demonstrate the efficiency of the estimator in realistic settings. Estimation semiparamétriquement efficace pour le problème du résultat auxiliaire dans le modèle à moyenne conditionnelle Résumé: Les auteurs s’intéressent à l’estimation semiparamétriquement efficace de paramètres dans le modèle à moyenne conditionnelle pour une structure de données incomplète simple dans laquelle l’événement d’intérêt n’est observé que pour un sousensemble aléatoire de sujets alors que les covariables et les variables de substitution (auxiliaires) sont observées pour tous. Ils font appel àlathéorie des fonctions d’estimation optimales pour déterminer le score semiparamétriquement efficace de façon explicite. Ils montrent que lorsque les covariables et les variables auxiliaires sont discrètes, un estimateur de type Horvitz–Thompson à poids estimés empiriquement est semiparamétriquement efficace. Les auteurs présentent des études de simulation validant le comportement à taille finie de l’estimateur semiparamétriquement efficace et de sa variance asymptotique; ils démontrent en outre l’efficacité de cet estimateur dans des contextes réalistes. 1.
A Double Sampling Scheme Model for . . .
, 1974
"... A general double sampling scheme model which employs a combination of an errorfree measurement process and a faulty measurement process is developed. The model allows estimation of measurement error variance and elimination of measurement process bias. The model is applied to two specific survey s ..."
Abstract
 Add to MetaCart
A general double sampling scheme model which employs a combination of an errorfree measurement process and a faulty measurement process is developed. The model allows estimation of measurement error variance and elimination of measurement process bias. The model is applied to two specific survey situations, a selfenumeration survey and an interviewer conducted survey. Using a cost function which reflects the relative cost of the errorfree measurement process and the faulty measurement process, optimum values for the sample sizes are derived and the optimum number of interviewers is indicated. For various values of the parameters the DSS model is compared to using only the faulty measurement process or only the errorfree measurement process and the preferred sampling scheme is indicated.
Use of Screening Tests to Assess Cancer Risk and to Estimate the Risk of Adult T
"... We developed methods to assess the cancer risks by screening tests. These methods estimate the size of the high risk group adjusted for the characteristics of screening tests and estimate the incidence rates of cancer among the high risk group adjusted for the characteristics of the tests. A method ..."
Abstract
 Add to MetaCart
We developed methods to assess the cancer risks by screening tests. These methods estimate the size of the high risk group adjusted for the characteristics of screening tests and estimate the incidence rates of cancer among the high risk group adjusted for the characteristics of the tests. A method was also developed for selecting the cutoff point of a screening test. Finally, the methods were applied to estimate the risk of the adult Tcell leukemia/lymphoma.
Foundations and Trends R ○ in Information Retrieval
"... Ediscovery refers generally to the process by which one party (e.g., the plaintiff) is entitled to “discover ” evidence in the form of “electronically stored information ” that is held by another party (e.g., the defendant) and that is relevant to some matter that is the subject of civil litigation ..."
Abstract
 Add to MetaCart
Ediscovery refers generally to the process by which one party (e.g., the plaintiff) is entitled to “discover ” evidence in the form of “electronically stored information ” that is held by another party (e.g., the defendant) and that is relevant to some matter that is the subject of civil litigation (i.e., what is commonly called a “lawsuit”). This survey describes the emergence of the field, identifies the information retrieval issues that arise, reviews the work to date on this topic, and summarizes major open issues.