Results 1 - 10
of
15
Wikipedia vandalism detection: Combining natural language, metadata, and reputation features
- In CICLing’11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, LNCS 6609
, 2011
"... Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an eff ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
(Show Context)
Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatiotemporal analysis of metadata (STiki), a reputation-based system (Wiki-Trust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 1
2011. Overview of the 2nd International Competition on Wikipedia Vandalism Detection
- In Notebook Papers of CLEF 2011 Labs and Workshops
"... Abstract The paper overviews the vandalism detection task of the PAN’11 com-petition. A new corpus is introduced which comprises about 30 000 Wikipedia edits in the languages English, German and Spanish as well as the necessary crowdsourced annotations. Moreover, the performance of three vandalism d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract The paper overviews the vandalism detection task of the PAN’11 com-petition. A new corpus is introduced which comprises about 30 000 Wikipedia edits in the languages English, German and Spanish as well as the necessary crowdsourced annotations. Moreover, the performance of three vandalism detec-tors is evaluated and compared to those of the PAN’10 competition.
Edit wars in Wikipedia
- In: Privacy, Security, Risk and Trust (PASSAT), 2011 IEEE Third International Conference on and 2011 IEEE Third International Conference on Social Computing (SocialCom
, 2011
"... Abstract-We present a new, efficient method for automatically detecting severe conflicts, 'edit wars' in Wikipedia and evaluate this method on six different language Wikipedias. We discuss how the number of edits and reverts deviate in such pages from those following the general workflow, ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract-We present a new, efficient method for automatically detecting severe conflicts, 'edit wars' in Wikipedia and evaluate this method on six different language Wikipedias. We discuss how the number of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the contentiousness of the Wikipedia editing process.
Automated Decision Support for Human Tasks in a Collaborative System: The Case of Deletion in Wikipedia
"... Wikipedia’s low barriers to participation have the unintended effect of attracting a large number of articles whose topics do not meet Wikipedia’s inclusion standards. Many are quickly deleted, often causing their creators to stop contributing to the site. We collect and make available several datas ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Wikipedia’s low barriers to participation have the unintended effect of attracting a large number of articles whose topics do not meet Wikipedia’s inclusion standards. Many are quickly deleted, often causing their creators to stop contributing to the site. We collect and make available several datasets of deleted articles, heretofore inaccessible, and use them to create a model that can predict with high precision whether or not an article will be deleted. We report precision of.986 and recall of.975 in the best case and high precision with lower, but still useful, recall, in the most difficult case. We propose to deploy a system utilizing this model on Wikipedia as a set of decision-support tools to help article creators evaluate and improve their articles before posting, and new article patrollers make more informed decisions about which articles to delete and which to improve.
On the evolution of quality flaws and the effectiveness of cleanup tags
- in the English Wikipedia. In Wikipedia Academy 2012 (WPAC
, 2012
"... The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the anal-ysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the anal-ysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a snapshot representing the state of Wikipedia at a certain time. This paper goes further, and provides the first comprehensive breakdown of the evolution of quality flaws in Wikipedia. We utilize cleanup tags to analyze the quality flaws that have been tagged by the Wikipedia community in the English Wikipedia, from its launch in 2001 until 2011. This leads to inter-esting findings regarding (1) the development of Wikipedia’s qual-ity flaw structure and (2) the usage and the effectiveness of cleanup tags. Specifically, we show that inline tags are more effective than tag boxes, and provide statistics about the considerable volume of rare and non-specific cleanup tags. We expect that this work will support the Wikipedia community in making quality assurance ac-tivities more efficient.
Cross language prediction of vandalism on Wikipedia using article views and revisions
- Longbing Cao, Hiroshi Motoda, and Guandong Xu, editors, Advances in Knowledge Discovery and Data Mining
, 2013
"... ..."
(Show Context)
Reputation Features
"... Document downloaded from: This paper must be cited as: The final publication is available at Copyright ..."
Abstract
- Add to MetaCart
(Show Context)
Document downloaded from: This paper must be cited as: The final publication is available at Copyright
Lang Resources & Evaluation DOI 10.1007/s10579-013-9232-5 ORIGINAL PAPER
"... Historical structured data extraction and vandalism detection from the Wikipedia edit history ..."
Abstract
- Add to MetaCart
Historical structured data extraction and vandalism detection from the Wikipedia edit history
Trust Evaluation Mechanisms for Wikipedia
"... Wikipedia is the well-nigh successful and most popular free encyclopedia developed by many editors in collaborative manner. It provides multitude of opportunities for online large scale knowledge sharing between virtual communities by letting the viewer to create and edit articles directly in the we ..."
Abstract
- Add to MetaCart
(Show Context)
Wikipedia is the well-nigh successful and most popular free encyclopedia developed by many editors in collaborative manner. It provides multitude of opportunities for online large scale knowledge sharing between virtual communities by letting the viewer to create and edit articles directly in the web browser. Information on Wikipedia is expanding largely, but the increase in quantity is not proportional to quality of the content. The cursory observer of Wikipedia may not be able to differentiate between the good and the bad quality of the content. Despite the success of Wikipedia, trust on Wikipedia content is still questioned
Combining Natural Language, Metadata, and Reputation Features
"... Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an eff ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatiotemporal analysis of metadata (STiki), a reputation-based system (Wiki-Trust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating