Results 1 - 10
of
99
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
- In Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedi ..."
Abstract
-
Cited by 172 (7 self)
- Add to MetaCart
Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r =0.56 to 0.75 for individual words and from r =0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users. 1
Designing Games With A Purpose
, 2008
"... Data generated as a side effect of game play also solves computational problems and trains AI algorithms. ..."
Abstract
-
Cited by 157 (1 self)
- Add to MetaCart
Data generated as a side effect of game play also solves computational problems and trains AI algorithms.
A Content-Driven Reputation System for the Wikipedia
"... On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benefit from a notion of reputation for its authors. Author reputation could be used to flag new contrib ..."
Abstract
-
Cited by 66 (7 self)
- Add to MetaCart
On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benefit from a notion of reputation for its authors. Author reputation could be used to flag new contributions from low-reputation authors, and it could be used to allow only authors with good reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia would also provide an incentive to give high-quality contributions. We present in this paper a novel type of contentdriven reputation system for Wikipedia authors. In our system, authors gain reputation when the edits and text additions they perform to Wikipedia articles are longlived, and they lose reputation when their changes are undone in short order. We have implemented the proposed system, and we have used it to analyze the entire Italian and French Wikipedias, consisting of a total of 691,551 pages and 5,587,523 revisions. Our results show that our notion of reputation has good predictive value: changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, and of being undone.
Wikify!: linking documents to encyclopedic knowledge
- In CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
, 2007
"... This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations. providing the users a quick way of accessing additional information. Wikipedia contributors perform these annotations by hand following a Wikipedia“manual of style,”which gives guidelines concerning the selection of important concepts in a text, as well as the assignment of links to appropriate related articles. For instance, Figure 1 shows an example of a Wikipedia page, including the definition for one of the meanings of the word “plant.”
He says, she says: conflict and coordination in wikipedia
- In Proc. SIGCHI Conf. Human factors in computing systems
, 2007
"... Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct work and describes the development of tools to characterize conflict and coordination costs in Wikipedia. The results may inform the design of new collaborative knowledge systems. Author Keywords Wikipedia, wiki, collaboration, conflict, user model, Web-based interaction, visualization. ACM Classification Keywords
Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination
"... Wikipedia’s success is often attributed to involving large numbers of contributors who improve the accuracy, completeness and clarity of articles while reducing bias. However, because of the high coordination needed to collaboratively write an article, increasing the number of contributors is costly ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
Wikipedia’s success is often attributed to involving large numbers of contributors who improve the accuracy, completeness and clarity of articles while reducing bias. However, because of the high coordination needed to collaboratively write an article, increasing the number of contributors is costly. We examined how the number of editors in Wikipedia and the coordination methods they use affect article quality. We distinguish between explicit coordination, in which editors plan the article through communication, and implicit coordination, in which a subset of editors set direction by doing the majority of the work. Adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not. Implicit coordination through concentrating the work was more helpful when many editors contributed, but explicit coordination through communication was not. Both types of coordination improved quality more when an article was in a formative stage. These results demonstrate the critical importance of coordination in effectively harnessing the “wisdom of the crowd ” in online production environments.
Temporal Analysis of the Wikigraph
- In Proc. of Web Intelligence, Hong Kong
, 2006
"... Abstract — Wikipedia (www.wikipedia.org) is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a “Wikigraph”, a graph that re ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract — Wikipedia (www.wikipedia.org) is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a “Wikigraph”, a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are timestamps associated with each node. The timestamps indicate the creation and update dates of each page, and this allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available. I.
Information extraction from Wikipedia: Moving down the long tail
- Proceedings of KDD08
, 2008
"... Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia’s long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
The Transformation of the Web: How Emerging Communities Shape the Information we Consume
- Journal of Universal Computer Science
, 2006
"... Abstract: To date, one of the main aims of the World Wide Web has been to provide users with information. In addition to private homepages, large professional information providers, including news services, companies, and other organisations have set up web-sites. With the development and advance of ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Abstract: To date, one of the main aims of the World Wide Web has been to provide users with information. In addition to private homepages, large professional information providers, including news services, companies, and other organisations have set up web-sites. With the development and advance of recent technologies such as wikis, blogs, podcasting and file sharing this model is challenged and community-driven services are gaining influence rapidly. These new paradigms obliterate the clear distinction between information providers and consumers. The lines between producers and consumers are blurred even more by services such as Wikipedia, where every reader can become an author, instantly. This paper presents an overview of a broad selection of current technologies and services: blogs, wikis including Wikipedia and Wikinews, social networks such as Friendster and Orkut as well as related social services like del.icio.us, file sharing tools such as Flickr, and podcasting. These services enable user participation on the Web and manage to recruit a large number of users as authors of new content. It is argued that the transformations the Web is subject to are not driven by new technologies but by a fundamental mind shift that encourages individuals to take part in developing new structures and content. The evolving services and technologies encourage ordinary users to make their knowledge explicit and help a collective intelligence to develop.
A semantic wiki for mathematical knowledge management
- Proceedings of the 1st Workshop on Semantic Wikis, European Semantic Web Conference 2006, Budva, Montenegro, 2006. CEUR Workshop Proceedings. To appear, provisional online version at http://www.eswc2006.org/technologies/ usb/proceedings-workshops/ eswc200
, 2007
"... SWIM is a semantic wiki for collaboratively building, editing and browsing mathematical knowledge represented in the structural markup language OMDOC. It has been designed to enable groups of scientists to develop new mathematical theories in OMDOC and to enable scholars to browse such a corpus. Aft ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
SWIM is a semantic wiki for collaboratively building, editing and browsing mathematical knowledge represented in the structural markup language OMDOC. It has been designed to enable groups of scientists to develop new mathematical theories in OMDOC and to enable scholars to browse such a corpus. After a short introduction to semantic wikis and their usefulness for mathematical knowledge, this article presents the architecture and the user interface of the current SWIM prototype and outlines the plans for developing its successor, an ontology-based platform for semantic scientific services that exploit the knowledge and make it accessible to the user. 1

