Results 1 -
3 of
3
Autonomous Link Spam Detection in Purely Collaborative Environments
"... Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations o ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedia’s link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriersto-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using wiki metadata, landing site analysis, and external data sources. The resulting classifier attains 64 % recall at 0.5% false-positives (ROC-AUC = 0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed. Categories andSubject Descriptors H.5.3 [Group and Organization Interfaces]: collaborative
What Wikipedia Deletes: Characterizing Dangerous Collaborative Content ∗
"... Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions whi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply“undone” – but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia’s public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia’s approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. Categories andSubject Descriptors H.5.3 [Group and Organization Interfaces]: collaborative
• Corpus construction • Features • Performance • Live implementation • Demonstration
"... Design a framework to detect link spam additions to wikis/Wikipedia, including those employing: (1) Subtlety; aims at link persistence (status quo) (2) Vulnerabilities of recent literature [1] And use this tool’s functionality to: (1) Autonomously undo obvious spam (i.e., a bot) (2) Feed non‐obvious ..."
Abstract
- Add to MetaCart
Design a framework to detect link spam additions to wikis/Wikipedia, including those employing: (1) Subtlety; aims at link persistence (status quo) (2) Vulnerabilities of recent literature [1] And use this tool’s functionality to: (1) Autonomously undo obvious spam (i.e., a bot) (2) Feed non‐obvious, but questionable instances to human patrollers in a streamlined fashion

