• Documents
  • Authors
  • Tables

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments (2013)

by R A Haertel
Add To MetaCart

Tools

Sorted by:
Results 1 - 1 of 1

MOMRESP: A Bayesian Model for Multi-Annotator Document Labeling

by Paul Felt, Robbie Haertel, Eric K. Ringger
"... Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MOMRESP, a model that improves upon item re ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels for the document classification task. We implement this model and show that MOMRESP can use unlabeled data to improve estimates of the ground-truth labels over a majority vote baseline dramatically in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Correspondingly, in those same situations, estimates of annotator reliability are also stronger than the majority vote baseline. Because MOMRESP predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MOMRESP does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work. Keywords:Bayesian models, corpus annotation, crowd-sourcing, identifiability 1.
(Show Context)

Citation Context

...resent an MCMC inference algorithm for the model. 3.1. Model MOMRESP is inspired by Bayesian models described, but not implemented or evaluated, in previous work (Carroll et al., 2007; Carroll, 2010; =-=Haertel, 2013-=-). It is called MOMRESP because it adds a mixture-of-multinomials (MOM) data component to a Bayesian item-response model. The model is based on three main principles: 1. Ground-truth labels y are unob...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University