Results 1 -
6 of
6
Autonomous Link Spam Detection in Purely Collaborative Environments
"... Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations o ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedia’s link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriersto-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using wiki metadata, landing site analysis, and external data sources. The resulting classifier attains 64 % recall at 0.5% false-positives (ROC-AUC = 0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed. Categories andSubject Descriptors H.5.3 [Group and Organization Interfaces]: collaborative
Link Spamming Wikipedia for Profit
"... Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particu ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined – link spamming – the addition of promotional or otherwise inappropriate hyperlinks. Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize exposure, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement. Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
What Wikipedia Deletes: Characterizing Dangerous Collaborative Content ∗
"... Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions whi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply“undone” – but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia’s public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia’s approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. Categories andSubject Descriptors H.5.3 [Group and Organization Interfaces]: collaborative
• Corpus construction • Features • Performance • Live implementation • Demonstration
"... Design a framework to detect link spam additions to wikis/Wikipedia, including those employing: (1) Subtlety; aims at link persistence (status quo) (2) Vulnerabilities of recent literature [1] And use this tool’s functionality to: (1) Autonomously undo obvious spam (i.e., a bot) (2) Feed non‐obvious ..."
Abstract
- Add to MetaCart
Design a framework to detect link spam additions to wikis/Wikipedia, including those employing: (1) Subtlety; aims at link persistence (status quo) (2) Vulnerabilities of recent literature [1] And use this tool’s functionality to: (1) Autonomously undo obvious spam (i.e., a bot) (2) Feed non‐obvious, but questionable instances to human patrollers in a streamlined fashion
University of Pennsylvania
"... Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particu ..."
Abstract
- Add to MetaCart
Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined – link spamming – the addition of promotional or otherwise inappropriate hyperlinks. Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize exposure, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures,

