Results 1 - 10
of
193
Linked Data -- The story so far
"... The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertion ..."
Abstract
-
Cited by 136 (7 self)
- Add to MetaCart
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions- the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
DBpedia -- A Crystallization Point for the Web of Data
, 2009
"... The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier ..."
Abstract
-
Cited by 70 (11 self)
- Add to MetaCart
The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.
Learning to link with wikipedia
, 2008
"... This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resultin ..."
Abstract
-
Cited by 66 (5 self)
- Add to MetaCart
This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or “real world ” documents. This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words—indexing, clustering, retrieval, and summarization to name a few—could use the techniques described here to draw on a vast network of concepts and semantics.
Automatically Refining the Wikipedia Infobox Ontology
, 2008
"... The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more se ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia’s infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia’s infobox-class schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.
Yago: A Large Ontology from Wikipedia and WordNet
, 2007
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.
Laublet P.: Meaning Of A Tag: A collaborative approach to bridge the gap between tagging and Linked Data
- Proceedings of the Linked Data on the Web (LDOW2008) workshop at WWW2008
, 2008
"... This paper introduces MOAT, a lightweight Semantic Web framework that provides a collaborative way to let Web 2.0 content producers give meanings to their tags in a machinereadable way. To achieve this goal, this approach relies on Linked Data principles, using URIs from existing resources to define ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
This paper introduces MOAT, a lightweight Semantic Web framework that provides a collaborative way to let Web 2.0 content producers give meanings to their tags in a machinereadable way. To achieve this goal, this approach relies on Linked Data principles, using URIs from existing resources to define these meanings. That way, users can create interlinked RDF data and let their content enter the Semantic Web, while solving some limits of free-tagging at the same time.
Automatic Interlinking of Music Datasets on the Semantic Web
"... In this paper, we describe current efforts towards interlinking music-related datasets on the Web. We first explain some initial interlinking experiences, and the poor results obtained by taking a naïve approach. We then detail a particular interlinking algorithm, taking into account both the simila ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we describe current efforts towards interlinking music-related datasets on the Web. We first explain some initial interlinking experiences, and the poor results obtained by taking a naïve approach. We then detail a particular interlinking algorithm, taking into account both the similarities of web resources and of their neighbours. We detail the application of this algorithm in two contexts: to link a Creative Commons music dataset to an editorial one, and to link a personal music collection to corresponding web identifiers. The latter provides a user with personally meaningful entry points for exploring the web of data, and we conclude by describing some concrete tools built to generate and use such links.
Triplify -- Light-Weight Linked Data Publication from Relational Databases
, 2009
"... In this paper we present Triplify – a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
In this paper we present Triplify – a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the current Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web applications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of configurations for common relational schemata and a REST-enabled data source registry. Triplify configurations containing mappings are provided for many popular Web applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We will show that despite its light-weight architecture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project.
SOFIE: A Self-Organizing Framework for Information Extraction
- WWW 2009 MADRID! TRACK: SEMANTIC/DATA WEB / SESSION: LINKED DATA
, 2009
"... This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plausibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontological reasoning in one unified model. Our experiments show that SOFIE delivers high-quality output, even from unstructured Internet documents.
LinkedGeoData -- Adding a Spatial Dimension to the Web of Data
- SUBMITTED TO INTERNATIONAL SEMANTIC WEB CONFERENCE 2009
, 2009
"... In order to employ the Web as a medium for data and information integration, comprehensive datasets and vocabularies are required as they enable the disambiguation and alignment of other data and information. Many real-life information integration and aggregation tasks are impossible without compreh ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In order to employ the Web as a medium for data and information integration, comprehensive datasets and vocabularies are required as they enable the disambiguation and alignment of other data and information. Many real-life information integration and aggregation tasks are impossible without comprehensive background knowledge related to spatial features of the ways, structures and landscapes surrounding us. In this paper we contribute to the generation of a spatial dimension for the Data Web by elaborating on how the collaboratively collected OpenStreetMap data can be transformed and represented adhering to the RDF data model, how this data can be interlinked with other spatial data sets, how it can be made accessible for machines according to the linked data paradigm and for humans by means of a faceted geo-data browser.

