Results 1 - 10
of
53
A content-driven reputation system for the Wikipedia
- In Proceedings of the 16th International World Wide Web Conference
, 2007
"... On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benet from a notion of reputation for its authors. Author reputation could be used to
ag new contributi ..."
Abstract
-
Cited by 168 (11 self)
- Add to MetaCart
(Show Context)
On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benet from a notion of reputation for its authors. Author reputation could be used to
ag new contributions from low-reputation authors, and it could be used to allow only authors with good reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia would also provide an incentive to give high-quality contributions. We present in this paper a novel type of content-driven reputation system for Wikipedia authors. In our system, authors gain reputation when the edits and text additions they perform to Wikipedia articles are long-lived, and they lose reputation when their changes are undone in short order. We have implemented the pro-posed system, and we have used it to analyze the en-tire Italian and French Wikipedias, consisting of a to-tal of 691,551 pages and 5,587,523 revisions. Our re-sults show that our notion of reputation has good pre-dictive value: changes performed by low-reputation au-thors have a signicantly larger than average probability of having poor quality, and of being undone. 1
Measuring Article Quality in Wikipedia: Models and Evaluation
, 2007
"... Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia artic ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
(Show Context)
Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our Basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview model introduces the review behavior into measuring article quality. Finally, our ProbReview models extend PeerReview with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.
Size Matters: Word Count as a Measure of Quality
- on Wikipedia. Computing
, 2008
"... Wikipedia, “the free encyclopedia”, now contains over two million English articles, and is widely regarded as a highquality, authoritative encyclopedia. Some Wikipedia articles, however, are of questionable quality, and it is not always apparent to the visitor which articles are good and which are b ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
(Show Context)
Wikipedia, “the free encyclopedia”, now contains over two million English articles, and is widely regarded as a highquality, authoritative encyclopedia. Some Wikipedia articles, however, are of questionable quality, and it is not always apparent to the visitor which articles are good and which are bad. We propose a simple metric – word count – for measuring article quality. In spite of its striking simplicity, we show that this metric significantly outperforms the more complex methods described in related work.
Knowing what to believe (when you already know something).
- In COLING,
, 2010
"... Abstract Although much work in NLP has focused on simply determining what a document means, we also must know whether or not to believe it. Fact-finding algorithms attempt to identify the "truth" among competing claims in a corpus, but fail to take advantage of the user's prior knowl ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
(Show Context)
Abstract Although much work in NLP has focused on simply determining what a document means, we also must know whether or not to believe it. Fact-finding algorithms attempt to identify the "truth" among competing claims in a corpus, but fail to take advantage of the user's prior knowledge and presume that truth itself is universal and objective rather than subjective. We introduce a framework for incorporating prior knowledge into any fact-finding algorithm, expressing both general "common-sense" reasoning and specific facts already known to the user as first-order logic and translating this into a tractable linear program. As our results show, this approach scales well to even large problems, both reducing error and allowing the system to determine truth respective to the user rather than the majority. Additionally, we introduce three new fact-finding algorithms capable of outperforming existing factfinders in many of our experiments.
Can You Ever Trust a Wiki? Impacting Perceived Trustworthiness in Wikipedia
"... Wikipedia has become one of the most important information resources on the Web by promoting peer collaboration and enabling virtually anyone to edit anything. However, this mutability also leads many to distrust it as a reliable source of information. Although there have been many attempts at devel ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(Show Context)
Wikipedia has become one of the most important information resources on the Web by promoting peer collaboration and enabling virtually anyone to edit anything. However, this mutability also leads many to distrust it as a reliable source of information. Although there have been many attempts at developing metrics to help users judge the trustworthiness of content, it is unknown how much impact such measures can have on a system that is perceived as inherently unstable. Here we examine whether a visualization that exposes hidden article information can impact readers ’ perceptions of trustworthiness in a wiki environment. Our results suggest that surfacing information relevant to the stability of the article and the patterns of editor behavior can have a significant impact on users ’ trust across a variety of page types.
Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata
, 2010
"... Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, b ..."
Abstract
-
Cited by 36 (16 self)
- Add to MetaCart
(Show Context)
Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85 % accuracy at 50 % recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.
On Ranking Controversies in Wikipedia: Models and Evaluation
, 2008
"... Wikipedia is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with controversial content. They also occur frequently amo ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
(Show Context)
Wikipedia is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with controversial content. They also occur frequently among contributors who are “aggressive ” or controversial in their personalities. In this paper, we aim to identify controversial articles in Wikipedia. We propose three models, namely the Basic model and two Controversy Rank (CR) models. These models draw clues from collaboration and edit history instead of interpreting the actual articles or edited content. While the Basic model only considers the amount of disputes within an article, the two Controversy Rank models extend the former by considering the relationships between articles and contributors. We also derived enhanced versions of these models by considering the age of articles. Our experiments on a collection of 19,456 Wikipedia articles shows that the Controversy Rank models can more effectively determine controversial articles compared to the Basic and other baseline models.
PML2: A Modular Explanation Interlingua
- In Proceedings of the AAAI 2007 Workshop on Explanation-aware Computing
"... Abstract In the past five years, we have designed and evolved an interlingua for sharing explanations generated by various automated systems such as hybrid web-based question answering systems, text analytics, theorem proving, task processing, web services execution, rule engines, and machine learn ..."
Abstract
-
Cited by 26 (16 self)
- Add to MetaCart
Abstract In the past five years, we have designed and evolved an interlingua for sharing explanations generated by various automated systems such as hybrid web-based question answering systems, text analytics, theorem proving, task processing, web services execution, rule engines, and machine learning components. In this paper, we present our recent major updates including: (i) splitting the interlingua into three modules (i.e. provenance, information manipulation or justifications, and trust) to reduce maintenance and reuse costs and to support various modularity requirements; (ii) providing representation primitives capable of representing four critical types of justifications identified in past work. We also discuss some examples of how this work can be and is being used in a variety of distributed application settings.
Wikipedia vandalism detection: Combining natural language, metadata, and reputation features
- In CICLing’11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, LNCS 6609
, 2011
"... Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an eff ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
Abstract. Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7 % are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatiotemporal analysis of metadata (STiki), a reputation-based system (Wiki-Trust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 1
Assessing the quality of Wikipedia articles with lifecycle based metrics
- In Proc. WikiSym
, 2009
"... The main feature of the free online-encyclopedia Wikipedia is the wiki-tool, which allows viewers to edit the articles directly in the web browser. As a weakness of this openness for example the possibility of manipulation and vandalism cannot be ruled out, so that the quality of any given Wikipedia ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
The main feature of the free online-encyclopedia Wikipedia is the wiki-tool, which allows viewers to edit the articles directly in the web browser. As a weakness of this openness for example the possibility of manipulation and vandalism cannot be ruled out, so that the quality of any given Wikipedia article is not guaranteed. Hence the automatic quality assessment has been becoming a high active research field. In this paper we offer new metrics for an efficient quality measurement. The metrics are based on the lifecycles of low and high quality articles, which refer to the changes of the persistent and transient contributions throughout the entire life span. Categories and Subject Descriptors