Results 1 -
3 of
3
Corleone: Hands-off crowdsourcing for entity matching
, 2014
"... Recent approaches to crowdsourcing entity matching (EM) are limited in that they crowdsource only parts of the EM workflow, requiring a developer to execute the remaining parts. Consequently, these approaches do not scale to the growing EM need at enterprises and crowdsourcing startups, and cannot h ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Recent approaches to crowdsourcing entity matching (EM) are limited in that they crowdsource only parts of the EM workflow, requiring a developer to execute the remaining parts. Consequently, these approaches do not scale to the growing EM need at enterprises and crowdsourcing startups, and cannot handle scenarios where ordinary users (i.e., the masses) want to leverage crowdsourcing to match entities. In response, we propose the notion of hands-off crowdsourcing (HOC), which crowdsources the entire workflow of a task, thus requiring no developers. We show how HOC can repre-sent a next logical direction for crowdsourcing research, scale up EM at enterprises and crowdsourcing startups, and open up crowdsourcing for the masses. We describe Corleone, a HOC solution for EM, which uses the crowd in all major steps of the EM process. Finally, we discuss the implica-tions of our work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.
How the live web feels about events
- In CIKM
, 2013
"... Microblogging platforms, such as Twitter, Tumblr etc., have been established as key components in the contemporary Web ecosys-tem. Users constantly post snippets of information regarding their actions, interests or perception of their surroundings, which is why they have been attributed the term Liv ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Microblogging platforms, such as Twitter, Tumblr etc., have been established as key components in the contemporary Web ecosys-tem. Users constantly post snippets of information regarding their actions, interests or perception of their surroundings, which is why they have been attributed the term Live Web. Nevertheless, research on such platforms has been quite limited when it comes to identify-ing events, but is rapidly gaining ground. Event identification is a key step to news reporting, proactive or reactive crisis management at multiple scales, efficient resource allocation, etc. In this paper, we focus on the problem of automatically identifying events as they occur, in such a user-driven, fast paced and voluminous setting. We propose a novel and natural way to address the issue using notions from emotional theories, combined with spatiotemporal informa-tion and employ online event detection mechanisms to solve it at large scale in a distributed fashion. We present a modular frame-work that incorporates all of our key ideas and experimentally val-idate its superiority, in terms of both efficiency and effectiveness, over the state-of-the-art using real life data from the Twitter stream. We also present empirical evidence on the importance of spatiotem-poral information in event detection for this setting.
Data-based Research at IIT Bombay
"... of research and development in the area of databases, dating back to the early 1980s. D. B. Phatak and N. L. Sarda were among the first faculty members at IIT Bombay to work in the area of database systems. This was a period when the financial sector of India, headquartered primarily in Bombay (now ..."
Abstract
- Add to MetaCart
(Show Context)
of research and development in the area of databases, dating back to the early 1980s. D. B. Phatak and N. L. Sarda were among the first faculty members at IIT Bombay to work in the area of database systems. This was a period when the financial sector of India, headquartered primarily in Bombay (now renamed Mumbai) saw a spurt in computerization, and IIT Bombay faculty played a leading role as consultants for database implementations in these companies. Research in the area of databases began in the early 1980s, but increased greatly from the early 1990s, with the hiring of several faculty including S. Seshadri, S. Sudarshan, and later Krithi Ramamritham, who moved to IIT Bombay from U. Mass. Amherst in the early to mid 1990s. With the hiring of Sunita Sarawagi and Soumen Chakrabarti in the late 1990s, there was a significant broadening, with the group no longer being just a database group, but rather a much broader data management group, with interests in information retrieval, and data mining. More recently Ganesh Ramakrishnan joined our group, further increasing its strengths in information retrieval and data mining. The number of PhD students increased from around 1 or 2 enrolled at a time in the early 1990s, to about 12 to 15 students at a time in recent years. While this number is much better than earlier, and is increasing rapidly, it is still small by most standards. However, our master’s and bachelor’s students have compensated for the shortage of PhD students, and have made very significant contributions to our research efforts, with well over three fourths of our papers having such students as coauthors. Today, the group covers a diverse range of interests, which you can see from the different research projects showcased in this article. In the following sections, we outline the major research projects of the group. We wrap up the article with a summary of other contributions to the community, by group members. For more information about the group, please visit: