Results 1 - 10
of
16
Quality Management on Amazon Mechanical Turk
"... Crowdsourcing services, such as Amazon Mechanical Turk, allow for easy distribution of small tasks to a large number of workers. Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take advantage of the verification difficulty and submit answ ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Crowdsourcing services, such as Amazon Mechanical Turk, allow for easy distribution of small tasks to a large number of workers. Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take advantage of the verification difficulty and submit answers of low quality. Currently, most requesters rely on redundancy to identify the correct answers. However, redundancy is not a panacea. Massive redundancy is expensive, increasing significantly the cost of crowdsourced solutions. Therefore, we need techniques that will accurately estimate the quality of the workers, allowing for the rejection and blocking of the low-performing workers and spammers. However, existing techniques cannot separate the true (unrecoverable) error rate from the (recoverable) biases that some workers exhibit. This lack of separation leads to incorrect assessments of a worker’s quality. We present algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error. Our algorithm generates a scalar score representing the inherent quality of each worker. We illustrate how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers. We present experimental results demonstrating the performance of the proposed algorithm under a variety of settings. 1.
Bayesian Knowledge Corroboration with Logical Rules and User Feedback
"... Abstract. Current knowledge bases suffer from either low coverage or low accuracy. The underlying hypothesis of this work is that user feedback can greatly improve the quality of automatically extracted knowledge bases. The feedback could help quantify the uncertainty associated with the stored stat ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. Current knowledge bases suffer from either low coverage or low accuracy. The underlying hypothesis of this work is that user feedback can greatly improve the quality of automatically extracted knowledge bases. The feedback could help quantify the uncertainty associated with the stored statements and would enable mechanisms for searching, ranking and reasoning at entity-relationship level. Most importantly, a principled model for exploiting user feedback to learn the truth values of statements in the knowledge base would be a major step forward in addressing the issue of knowledge base curation. We present a family of probabilistic graphical models that builds on user feedback and logical inference rules derived from the popular Semantic-Web formalism of RDFS [1]. Through internal inference and belief propagation, these models can learn both, the truth values of the statements in the knowledge base and the reliabilities of the users who give feedback. We demonstrate the viability of our approach in extensive experiments on real-world datasets, with feedback collected from Amazon Mechanical Turk.
Iterative Learning for Reliable Crowdsourcing Systems
"... Crowdsourcing systems, in which tasks are electronically distributed to numerous “information piece-workers”, have emerged as an effective paradigm for humanpowered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Crowdsourcing systems, in which tasks are electronically distributed to numerous “information piece-workers”, have emerged as an effective paradigm for humanpowered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such crowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers ’ answers. We show that our algorithm significantly outperforms majority voting and, in fact, is asymptotically optimal through comparison to an oracle that knows the reliability of every worker. 1
Bayesian Bias Mitigation for Crowdsourcing
"... Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated sepa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the effects of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the sources of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics. 1
Dynamically switching between synergistic workflows for crowdsourcing
- In Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI ’12
, 2012
"... To ensure quality results from unreliable crowdsourced workers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accomplish the task, and eventually deploy the one that achieves the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
To ensure quality results from unreliable crowdsourced workers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accomplish the task, and eventually deploy the one that achieves the best performance during early trials. Surprisingly, this seemingly natural design paradigm does not achieve the full potential of crowdsourcing. In particular, using a single workflow (even the best) to accomplish a task is suboptimal. We show that alternative workflows can compose synergistically to yield much higher quality output. We formalize the insight with a novel probabilistic graphical model. Based on this model, we design and implement AGENTHUNT, a POMDP-based controller that dynamically switches between these workflows to achieve higher returns on investment. Additionally, we design offline and online methods for learning model parameters. Live experiments on Amazon Mechanical Turk demonstrate the superiority of AGENTHUNT for the task of generating NLP training data, yielding up to 50 % error reduction and greater net utility compared to previous methods.
Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
"... In many computational linguistic scenarios, training labels are subjectives making it necessary to acquire the opinions of multiple annotators/experts, which is referred to as ”wisdom of crowds”. In this paper, we propose a new approach for modeling wisdom of crowds based on the Latent Mixture of Di ..."
Abstract
- Add to MetaCart
In many computational linguistic scenarios, training labels are subjectives making it necessary to acquire the opinions of multiple annotators/experts, which is referred to as ”wisdom of crowds”. In this paper, we propose a new approach for modeling wisdom of crowds based on the Latent Mixture of Discriminative Experts (LMDE) model that can automatically learn the prototypical patterns and hidden dynamic among different experts. Experiments show improvement over state-of-the-art approaches on the task of listener backchannel prediction in dyadic conversations. 1
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Uncertainty Analysis of Neural-Network-Based Aerosol Retr
"... Abstract—Neural networks have the ability to represent and learn complex regression functions and are very suitable for retrieval of geophysical parameters from remotely sensed data. Neural networks trained to minimize the mean square error are able to estimate the conditional expectation of target ..."
Abstract
- Add to MetaCart
Abstract—Neural networks have the ability to represent and learn complex regression functions and are very suitable for retrieval of geophysical parameters from remotely sensed data. Neural networks trained to minimize the mean square error are able to estimate the conditional expectation of target variables. In many remote sensing applications, it is also critical to provide estimates of prediction uncertainty. In this paper, we evaluate an approach that, in addition to training a neural network for retrievals, also trains a neural-network-based estimator of retrieval uncertainty. The uncertainty estimator is built under the assumption that uncertainty is a function of input variables. The methodology was evaluated on aerosol-optical-depth retrieval. The data set consists of 38 238 collocated Moderate Resolution Imaging Spectrometer (MODIS) satellite instrument and Aerosol Robotic Network ground-based instrument measurements collected over the entire Earth during two years (in 2005–2006). The results indicate that a neural network ensemble is more accurate than the operational MODIS retrieval algorithm called Collection 5 and that the retrieval uncertainty of the ensemble can be estimated with satisfactory accuracy. Index Terms—Regression, remote sensing, uncertainty. I.
A Robust Bayesian Truth Serum for Small Populations (Technical Report)
"... Peer prediction methods allow the truthful elicitation of private signals (e.g., experiences, or opinions) in regard to a true world state when this ground truth is unobservable. The original peer prediction method is incentive compatible for any finite number of agents n ≥ 2 but critically relies o ..."
Abstract
- Add to MetaCart
Peer prediction methods allow the truthful elicitation of private signals (e.g., experiences, or opinions) in regard to a true world state when this ground truth is unobservable. The original peer prediction method is incentive compatible for any finite number of agents n ≥ 2 but critically relies on a common prior, shared by all agents and the center. The Bayesian Truth Serum (BTS) relaxes this assumption. While it still assumes that the agents share a common prior, this prior need not be known by the center. However, BTS is proven to be incentive compatible only for a large enough number of agents, and this number depends on the prior and is thus unknown to the mechanism. In this paper, we present a robust BTS for the elicitation of binary information which is incentive compatible for any n ≥ 3, taking advantage of a particularity of the quadratic scoring rule. Our mechanism is the first peer prediction method that does not rely on knowledge of the common prior to provide strict incentive compatibility for any n ≥ 3. Moreover, and in contrast to the original BTS, our mechanism is numerically robust and ex post individually rational.
Approximating the Wisdom of the Crowd
"... The problem of “approximating the crowd ” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense, ” that works in an online fas ..."
Abstract
- Add to MetaCart
The problem of “approximating the crowd ” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense, ” that works in an online fashion to dynamically sample subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of the labelers ’ votes that approximates the crowd’s opinion. 1
Managing Crowdsourced Human Computation
"... The proposed tutorial covers an emerging topic of wide interest: Crowdsourcing. Specifically, we cover areas of crowdsourcing related to managing structured and unstructured data in a web-related content. Many researchers and practitioners today see the great opportunity that becomes available throu ..."
Abstract
- Add to MetaCart
The proposed tutorial covers an emerging topic of wide interest: Crowdsourcing. Specifically, we cover areas of crowdsourcing related to managing structured and unstructured data in a web-related content. Many researchers and practitioners today see the great opportunity that becomes available through easily-available crowdsourcing platforms. However, most newcomers face the same questions: How can we manage the (noisy) crowds to generate high quality output? How to estimate the quality of the contributors? How can we best structure the tasks? How can we get results in small amounts of time and minimizing the necessary resources? How to setup the incentives? How should such crowdsourcing markets be setup? Their presented material will cover topics from a variety of fields, including computer science, statistics, economics, and psychology. Furthermore, the material will include real-life examples and case studies from years of experience in running and managing crowdsourcing applications in business settings. The tutorial presenters have an extensive academic and systems building experience and will provide the audience with data sets that can be used for hands-on tasks. Keywords crowdsourcing mechanical turk workflow control quality assurance incentives reputation market design human computation 1.

