Results 1 - 10
of
19
Utility of Human-Computer Interactions: Toward a Science of Preference Measurement
"... The success of a computer system depends upon a user choosing it, but the field of Human-Computer Interaction has little ability to predict this user choice. We present a new method that measures user choice, and quantifies it as a measure of utility. Our method has two core features. First, it intr ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The success of a computer system depends upon a user choosing it, but the field of Human-Computer Interaction has little ability to predict this user choice. We present a new method that measures user choice, and quantifies it as a measure of utility. Our method has two core features. First, it introduces an economic definition of utility, one that we can operationalize through economic experiments. Second, we employ a novel method of crowdsourcing that enables the collection of thousands of economic judgments from real users. ACM Classification: H5.m. Information interfaces and presentation: User Interfaces.
Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces
"... Abstract. Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated workers to label a fraction of the total frames and otherwise settling for linear interpolation. As budgets and scale require sparser key frames, the assumption of linearity fails and labels become inaccurate. To address this problem we have created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces. By extracting pixel-based features from manually labeled entities, we are able to leverage more sophisticated interpolation between key frames to maximize performance given a budget. Finally, by validating the power of our framework on difficult, real-world data sets we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling. 1
Guess Again (and again and again): Measuring Password Strength by Simulating Password-Cracking Algorithms
- CMU-CYLAB-11-008
, 2011
"... Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement
in attackers’ capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient resea ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement
in attackers’ capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient research defining metrics to characterize password strength and evaluating password-composition policies using these metrics. In this paper, we describe an analysis of 12,000 passwords collected under seven composition policies via an online study. We develop an efficient distributed method for calculating how effectively several heuristic password-guessing algorithms guess passwords. Leveraging this method, we investigate (a) the resistance of passwords created under different conditions to password guessing; (b) the performance of guessing algorithms under different training sets; (c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements; and (d) the relationship between guessability, as measured with password-cracking algorithms, and entropy estimates. We believe our findings advance understanding of both password-composition policies and metrics for quantifying password security.
Conducting Usable Privacy & Security Studies with Amazon’s Mechanical Turk
"... Being able to conduct human subjects experiments in a distributed method across the Internet is frequently desirable to support broad tests of usability. Until recently these experiments were commonly advertised in an ad-hoc fashion, using mailing lists, contest sites, and online bulletin boards. Re ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Being able to conduct human subjects experiments in a distributed method across the Internet is frequently desirable to support broad tests of usability. Until recently these experiments were commonly advertised in an ad-hoc fashion, using mailing lists, contest sites, and online bulletin boards. Recently Amazon’s Mechanical Turk, a service where users can complete short tasks and receive automatic payment, has become prominent in the HCI community. We describe three different usable privacy and security experiments that were conducted through Mechanical Turk, highlighting both reasons for using Amazon’s service as well as common pitfalls that we encountered. Categories and Subject Descriptors H.5.3 [Group and Organization Interfaces]: Web-based interaction; H.1.2 [Models and Principles]: User/Machine
Putting Out a HIT: Crowdsourcing Malware Installs
"... Today, several actors within the Internet’s burgeoning underground economy specialize in providing services to like-minded criminals. At the same time, gray and white markets exist for services on the Internet providing reasonably similar products. In this paper we explore a hypothetical arbitrage b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Today, several actors within the Internet’s burgeoning underground economy specialize in providing services to like-minded criminals. At the same time, gray and white markets exist for services on the Internet providing reasonably similar products. In this paper we explore a hypothetical arbitrage between these two markets by purchasing “Human Intelligence ” on Amazon’s Mechanical Turk service, determining the vulnerability of and cost to compromise the computers being used by the humans to provide this service, and estimating the underground value of the computers which are vulnerable to exploitation. We show that it is economically feasible for an attacker to purchase access to high value hosts via Mechanical Turk, compromise the subset with unpatched, vulnerable browser plugins, and sell access to these hosts via Pay-Per-Install programs for a tidy profit. We also present supplementary statistics gathered regarding Mechanical Turk workers ’ browser security, antivirus usage, and willingness to run arbitrary programs in exchange for a small monetary reward. 1
Perception of Personality and Naturalness through Dialogues by Native Speakers of American English and Arabic
"... Linguistic markers of personality traits have been studied extensively, but few crosscultural studies exist. In this paper, we evaluate how native speakers of American English and Arabic perceive personality traits and naturalness of English utterances that vary along the dimensions of verbosity, he ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Linguistic markers of personality traits have been studied extensively, but few crosscultural studies exist. In this paper, we evaluate how native speakers of American English and Arabic perceive personality traits and naturalness of English utterances that vary along the dimensions of verbosity, hedging, lexical and syntactic alignment, and formality. The utterances are the turns within dialogue fragments that are presented as text transcripts to the workers of Amazon’s Mechanical Turk. The results of the study suggest that all four dimensions can be used as linguistic markers of all personality traits by both language communities. A further comparative analysis shows cross-cultural differences for some combinations of measures of personality traits and naturalness, the dimensions of linguistic variability and dialogue acts. 1
Assessing the Effect of Visualizations on Bayesian Reasoning through
"... Fig. 1. The six visualizations evaluated in our study, illustrating the classic mammography problem [21]. Abstract—People have difficulty understanding statistical information and are unaware of their wrong judgments, particularly in Bayesian reasoning. Psychology studies suggest that the way Bayesi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Fig. 1. The six visualizations evaluated in our study, illustrating the classic mammography problem [21]. Abstract—People have difficulty understanding statistical information and are unaware of their wrong judgments, particularly in Bayesian reasoning. Psychology studies suggest that the way Bayesian problems are represented can impact comprehension, but few visual designs have been evaluated and only populations with a specific background have been involved. In this study, a textual and six visual representations for three classic problems were compared using a diverse subject pool through crowdsourcing. Visualizations included area-proportional Euler diagrams, glyph representations, and hybrid diagrams combining both. Our study failed to replicate previous findings in that subjects ’ accuracy was remarkably lower and visualizations exhibited no measurable benefit. A second experiment confirmed that simply adding a visualization to a textual Bayesian problem is of little help, even when the text refers to the visualization, but suggests that visualizations are more effective when the text is given without numerical values. We discuss our findings and the need for more such experiments to be carried out on heterogeneous populations of non-experts.
Communitysourcing: Engaging Local Crowds to Perform Expert Work Via Physical Kiosks
"... Online labor markets, such as Amazon’s Mechanical Turk, have been used to crowdsource simple, short tasks like image labeling and transcription. However, expert knowledge is often lacking in such markets, making it impossible to complete certain classes of tasks. In this work we introduce an alterna ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Online labor markets, such as Amazon’s Mechanical Turk, have been used to crowdsource simple, short tasks like image labeling and transcription. However, expert knowledge is often lacking in such markets, making it impossible to complete certain classes of tasks. In this work we introduce an alternative mechanism for crowdsourcing tasks that require specialized knowledge or skill: communitysourcing — the use of physical kiosks to elicit work from specific populations. We investigate the potential of communitysourcing by designing, implementing and evaluating Umati: the communitysourcing vending machine. Umati allows users to earn credits by performing tasks using a touchscreen attached to the machine. Physical rewards (in this case, snacks) are dispensed through traditional vending mechanics. We evaluated whether communitysourcing can accomplish expert work by using Umati to grade Computer Science exams. We placed Umati in a university Computer Science building, targeting students with grading tasks for snacks. Over one week, 328 unique users (302 of whom were students) completed 7771 tasks (7240 by students). 80 % of users had never participated in a crowdsourcing market before. We found that Umati was able to grade exams with 2 % higher accuracy (at the same price) or at 33 % lower cost (at equivalent accuracy) than traditional single-expert grading. Mechanical Turk workers had no success grading the same exams. These results indicate that communitysourcing can successfully elicit highquality expert work from specific communities.
Worker Types and Personality Traits in Crowdsourcing Relevance Labels
"... Crowdsourcing platforms offer unprecedented opportunities for creating evaluation benchmarks, but suffer from varied output quality from crowd workers who possess different levels of competence and aspiration. This raises new challenges for quality control and requires an in-depth understanding of h ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Crowdsourcing platforms offer unprecedented opportunities for creating evaluation benchmarks, but suffer from varied output quality from crowd workers who possess different levels of competence and aspiration. This raises new challenges for quality control and requires an in-depth understanding of how workers ’ characteristics relate to the quality of their work. In this paper, we use behavioral observations (HIT completion time, fraction of useful labels, label accuracy) to define five worker types: Spammer, Sloppy, Incompetent, Competent, Diligent. Using data collected from workers engaged in the crowdsourced evaluation of the INEX 2010 Book Track Prove It task, we relate the worker types to label accuracy and personality trait information along the ‘Big Five ’ personality dimensions. We expect that these new insights about the types of crowd workers and the quality of their work will inform how to design HITs to attract the best workers to a task and explain why certain HIT designs are more effective than others. Categories and Subject Descriptors: H.3.4 [Information Storage and Retrieval]: Systems and Software—performance evaluation (efficiency and effectiveness)
Author Keywords
"... The ongoing rise of human computation as a means of solving computational problems has created an environment where human workers are often regarded as nameless, faceless computational resources. Some people have begun to think of online tasks as a “remote person call”. In this paper, we summarize e ..."
Abstract
- Add to MetaCart
The ongoing rise of human computation as a means of solving computational problems has created an environment where human workers are often regarded as nameless, faceless computational resources. Some people have begun to think of online tasks as a “remote person call”. In this paper, we summarize ethical and practical labor issues surrounding online labor, and offer a set of guidelines for designing and using online labor in ways that support more positive relationships between workers and requestors, so that both can gain the most benefit from the interaction.

