Results 1 -
4 of
4
Leveraging transitive relations for crowdsourced joins
- In SIGMOD Conference
, 2013
"... ABSTRACT The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
ABSTRACT The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. As a human-only solution is expensive, we adopt a hybrid human-machine approach which first uses machines to generate a candidate set of matching pairs, and then asks humans to label the pairs in the candidate set as either matching or non-matching. Given the candidate pairs, existing approaches will publish all pairs for verification to a crowdsourcing platform. However, they neglect the fact that the pairs satisfy transitive relations. As an example, if o1 matches with o2, and o2 matches with o3, then we can deduce that o1 matches with o3 without needing to crowdsource (o1, o3). To this end, we study how to leverage transitive relations for crowdsourced joins. We propose a hybrid transitive-relations and crowdsourcing labeling framework which aims to crowdsource the minimum number of pairs to label all the candidate pairs. We prove the optimal labeling order and devise a parallel labeling algorithm to efficiently crowdsource the pairs following the order. We evaluate our approaches in both simulated environment and a real crowdsourcing platform. Experimental results show that our approaches with transitive relations can save much more money and time than existing methods, with a little loss in the result quality.
Optimal Crowd-Powered Rating and Filtering Algorithms
"... We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solutions for crowd-powered filtering exist, they make a range of implicit assumptions and restrictions, ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solutions for crowd-powered filtering exist, they make a range of implicit assumptions and restrictions, ultimately rendering them not powerful enough for real-world applications. We describe two approaches to discard these implicit assumptions and restrictions: one, that carefully generalizes prior work, leading to an optimal, but oftentimes intractable solution, and another, that provides a novel way of reasoning about filtering strategies, leading to a sometimes suboptimal, but efficiently computable solution (that is asymptotically close to optimal). We demonstrate that our techniques lead to significant reductions in error of up to 30 % for fixed cost over prior work in a novel crowdsourcing application: peer evaluation in online courses. 1.
Query Optimization in CrowdDB
, 2012
"... While database management systems have successfully established themselves as reliable and highly optimized tools for managing data, they still fall short when it comes to certain types of queries, such as subjective comparisons and finding missing data. Crowdsourcing databases, such as CrowdDB, off ..."
Abstract
- Add to MetaCart
While database management systems have successfully established themselves as reliable and highly optimized tools for managing data, they still fall short when it comes to certain types of queries, such as subjective comparisons and finding missing data. Crowdsourcing databases, such as CrowdDB, offer a solution by harnessing the knowledge and problem-solving abilities of large groups of people in order to perform the tasks at which humans excel. Nevertheless, people are prone to make mistakes in their judgements and give inaccurate answers, which can negatively impact the quality of results. To solve this issue, the commonly adopted approach is to ask the same question several times from the crowd. However, such redundancy significantly increases the cost of solutions, making it an infeasible approach for crowdsourcing databases where a large number of queries needs to be answered with a limited budget and possibly a certain level of quality. In this thesis, we address the inherent trade-off between cost and quality