DMCA
Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution (2010)
Venue: | In SIGIR 2010 workshop |
Citations: | 46 - 1 self |
Citations
734 | Support vector machine active learning with applications to text classification
- Tong, Koller
(Show Context)
Citation Context ...rtant to have robust training sets. Humans are not machines, so when doing machine-learning-like tasks where we use humans as classifiers, we must apply different techniques to train them. Tong et al =-=[11]-=- noted that incorporating active learning methods in training machine-learned classifiers may offer improvements to traditional methods. This result may also imply that strategies for training humans ... |
339 |
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. EMNLP
- Snow, O'Connor, et al.
- 2008
(Show Context)
Citation Context ...ghput while ensuring judge quality. Current strategies for evaluating and ensuring quality in crowdsourced tests include measurement of agreement, qualification questions, and worker trust algorithms =-=[10, 7, 6]-=-. When measuring quality with agreement, either by majority vote or similar methods, it is important to consider that high agreement among multiple judges may reflect a variety of factors, particularl... |
177 | Quality management on amazon mechanical turk.
- Ipeirotis, Provost, et al.
- 2010
(Show Context)
Citation Context ...ghput while ensuring judge quality. Current strategies for evaluating and ensuring quality in crowdsourced tests include measurement of agreement, qualification questions, and worker trust algorithms =-=[10, 7, 6]-=-. When measuring quality with agreement, either by majority vote or similar methods, it is important to consider that high agreement among multiple judges may reflect a variety of factors, particularl... |
97 |
Crowdsourcing for relevance evaluation.
- Alonso, Rose, et al.
- 2008
(Show Context)
Citation Context ...g is the use of large, distributed groups of people to complete microtasks or to generate information. Because traditional search relevance evaluation requiring expert assessment is a lengthy process =-=[2, 3, 5]-=-, crowdsourcing has gained traction as an alternative solution for these types of high volume tasks [2, 1]. In some cases, crowdsourcing may provide a better approach than a more traditional, highly-s... |
77 |
Are your participants gaming the system?: Screening Mechanical Turk workers.
- Downs, Holbrook, et al.
- 2010
(Show Context)
Citation Context ...kers are notified that only upon passing this section will they receive payment. We inform workers of their mistakes. After this training period, training data is used as periodic screening questions =-=[4]-=- to provide live feedback when workers err. The feedback explains what the correct answer should be and why. For every 20 query-result pairs a worker saw, they also were exposed to five training data ... |
56 | Here or there: Preference judgments for relevance.
- Carterette, Bennett, et al.
- 2008
(Show Context)
Citation Context ...g is the use of large, distributed groups of people to complete microtasks or to generate information. Because traditional search relevance evaluation requiring expert assessment is a lengthy process =-=[2, 3, 5]-=-, crowdsourcing has gained traction as an alternative solution for these types of high volume tasks [2, 1]. In some cases, crowdsourcing may provide a better approach than a more traditional, highly-s... |
35 | Crowdsourcing document relevance assessment with mechanical turk.
- Grady, Lease
- 2010
(Show Context)
Citation Context ...ghput while ensuring judge quality. Current strategies for evaluating and ensuring quality in crowdsourced tests include measurement of agreement, qualification questions, and worker trust algorithms =-=[10, 7, 6]-=-. When measuring quality with agreement, either by majority vote or similar methods, it is important to consider that high agreement among multiple judges may reflect a variety of factors, particularl... |
24 | Web search engine evaluation using clickthrough data and a user model
- Dupret, Murdock, et al.
- 2007
(Show Context)
Citation Context ...g is the use of large, distributed groups of people to complete microtasks or to generate information. Because traditional search relevance evaluation requiring expert assessment is a lengthy process =-=[2, 3, 5]-=-, crowdsourcing has gained traction as an alternative solution for these types of high volume tasks [2, 1]. In some cases, crowdsourcing may provide a better approach than a more traditional, highly-s... |
10 | On the evaluation of the quality of relevance assessments collected through crowdsourcing
- Kazai, Milic-frayling
- 2009
(Show Context)
Citation Context ...ewpoints for the same comparison. Feedback from varying viewpoints naturally captures the myriad interpretations a particular problem may have. Quality assurance is a major challenge of crowdsourcing =-=[8, 9]-=-. Without a rigorous quality control strategy, workers Copyright is held by the author/owner(s). SIGIR ’10, July 19-23, 2010, Geneva, Switzerland often produce an abundance of poor judgments. Poor jud... |
2 |
Guidelines for designing crowdsourcingbased relevance evaluation
- Alonso
- 2009
(Show Context)
Citation Context ...e traditional search relevance evaluation requiring expert assessment is a lengthy process [2, 3, 5], crowdsourcing has gained traction as an alternative solution for these types of high volume tasks =-=[2, 1]-=-. In some cases, crowdsourcing may provide a better approach than a more traditional, highly-structured judgment task because it facilitates the collection of feedback from a wide variety of viewpoint... |