Results 1 - 10
of
133
Executing SPARQL queries over the web of linked data
, 2009
"... Abstract. The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Abstract. The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges. 1
Using web data provenance for quality assessment
- In: Proc. of the Workshop on Semantic Web and Provenance Management at ISWC
, 2009
"... Abstract—The Web of Data cannot be a trustworthy data source unless an approach for evaluating the quality of data on the Web is established and integrated as part of the data publication and access process. In this paper, we propose an approach of using provenance information about the data on the ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract—The Web of Data cannot be a trustworthy data source unless an approach for evaluating the quality of data on the Web is established and integrated as part of the data publication and access process. In this paper, we propose an approach of using provenance information about the data on the Web to assess their quality and trustworthiness. Our contributions include a model for Web data provenance and an assessment method that can be adapted for specific quality criteria. We demonstrate how this method can be used to evaluate the timeliness of data on the Web, to reflect how up-to-date the data is. We also propose a possible solution to deal with missing provenance information by associating certainty values with calculated quality values. I.
Publishing and Consuming Provenance Metadata on the Web of Linked Data
- In: Proc. of 3rd Int. Provenance and Annotation Workshop
, 2010
"... Abstract. The World Wide Web evolves into a Web of Data, a huge, globally distributed dataspace that contains a rich body of machineprocessable information from a virtually unbound set of providers covering a wide range of topics. However, due to the openness of the Web little is known about who cre ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. The World Wide Web evolves into a Web of Data, a huge, globally distributed dataspace that contains a rich body of machineprocessable information from a virtually unbound set of providers covering a wide range of topics. However, due to the openness of the Web little is known about who created the data and how. The fact that a large amount of the data on the Web is derived by replication, query processing, modification, or merging raises concerns of information quality. Poor quality data may propagate quickly and contaminate the Web of Data. Provenance information about who created and published the data and how, provides the means for quality assessment. This paper takes a first step towards creating a quality-aware Web of Data: we present approaches to integrate provenance information into the Web of Data and we illustrate how this information can be consumed. In particular, we introduce a vocabulary to describe provenance of Web data as metadata and we discuss possibilities to make such provenance metadata accessible as part of the Web of Data. Furthermore, we describe how this metadata can be queried and consumed to identify outdated information. 1
A.P.: Linked Data is Merely More Data
- In: AAAI Spring Symposium ’Linked Data Meets Artificial Intelligence’, AAAI
, 2010
"... In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked “triple collection, ” it will only be of very limited benefit for the AI or Semantic Web communities. We describe ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked “triple collection, ” it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation. Where We Are The recent emergence of the “Linked Data ” approach for publishing data represents a major step forward in realizing Berners-Lee, Handler and Lassila’s original vision of a web that can “understand and satisfy the requests of people and machines to use the web content ” 1 – i.e. the Semantic Web (Berners-Lee et al. 2001). This new approach has resulted in the Linked Open Data (LoD) Cloud (Bizer et al. 2007), which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences (Bizer, Heath, and Berners-Lee 2009). Table 1 lists some of the datasets available as a part of LoD Cloud. 2 The interlinking of these diverse datasets promises a “Web of Data ” that will enable users to easily navigate between these datasets in a manner analogous to how users currently navigate from one webpage to another in the “Web of Documents. ” Moreover, the LoD Cloud can significantly benefit both the AI and Semantic Web communities by enabling new classes of applications and enhancing existing tasks such as querying, reasoning, and knowledge discovery. To exemplify, a scientist interested in exploring the relationship between the presence of the spider “Agelenopsis emertoni ” and weather patterns, can do so easily with the help of the LoD Cloud as the Geospecies dataset gives information about the spider “Agelenopsis emertoni, ” and the interlinking of Geospecies with Geonames makes it easy to
Toward the Next Wave of Services: Linked Services for the Web of Data
"... Abstract: It has often been argued that Web services would have a tremendous impact on the Web, as a core enabling technology supporting a highly efficient service-based economy at a global scale. However, despite the outstanding progress in the area we are still to witness the application of Web se ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract: It has often been argued that Web services would have a tremendous impact on the Web, as a core enabling technology supporting a highly efficient service-based economy at a global scale. However, despite the outstanding progress in the area we are still to witness the application of Web services in any significant numbers on the Web. In this paper, we analyse the state of the art highlighting the main reasons we believe have hampered their uptake. Based on this analysis, we further discuss about current trends and development within other fields such as the Semantic Web and Web 2.0 and argue that the recent evolution provides the missing ingredients that will lead to a new wave of services – Linked Services – that will ultimately witness a significant uptake on a Web scale. Throughout the presentation of this vision we outline the main principles that shall be underpinning the development of Linked Services and we illustrate how they can be implemented using a number of technologies and tools we have developed and are in the process of extending.
Searching RDF Graphs with SPARQL and Keywords
"... The proliferation of knowledge-sharing communities like Wikipedia and the advances in automated information extraction from Web pages enable the construction of large knowledge bases with facts about entities and their relationships. The facts can be represented in the RDF data model, as so-called s ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The proliferation of knowledge-sharing communities like Wikipedia and the advances in automated information extraction from Web pages enable the construction of large knowledge bases with facts about entities and their relationships. The facts can be represented in the RDF data model, as so-called subject-property-object triples, and can thus be queried by structured query languages like SPARQL. In principle, this allows precise querying in the database spirit. However, RDF data may be highly diverse and queries may return way too many results, so that ranking by informativeness measures is crucial to avoid overwhelming users. Moreover, as facts are extracted from textual contexts or have community-provided annotations, it can be beneficial to consider also keywords for formulating search requests. This paper gives an overview of recent and ongoing work on ranked retrieval of RDF data with keyword-augmented structured queries. The ranking method is based on statistical language models, the state-of-the-art paradigm in information retrieval. The paper develops a novel form of language models for the structured, but schema-less setting of RDF triples and extended SPARQL queries. 1
Dbpedia spotlight: Shedding light on the web of documents
- In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics
, 2011
"... Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a syste ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
An agenda for the next generation gazetteer: Geographic information contribution and retrieval
- In International Conference on Advances in Geographic Information Systems 2009 (ACM SIGSPATIAL GIS 2009
, 2009
"... Gazetteers are key components of georeferenced information systems, including applications such as Web-based mapping services. Existing gazetteers lack the capabilities to fully integrate user-contributed and vernacular geographic information, as well as to support complex queries. To address these ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Gazetteers are key components of georeferenced information systems, including applications such as Web-based mapping services. Existing gazetteers lack the capabilities to fully integrate user-contributed and vernacular geographic information, as well as to support complex queries. To address these issues, a next generation gazetteer should leverage formal semantics, harvesting of implicit geographic information – such as geotagged photos – as well as models of trust for contributors. In this paper, we discuss these requirements in detail. We elucidate how existing standards can be integrated to realize a gazetteer infrastructure allowing for bottom-up contribution as well as information exchange between different gazetteers. We show how to ensure the quality of user-contributed information and demonstrate how to improve querying and navigation using semantics-based information retrieval.
DSNotify: Handling Broken Links in the Web of Data
- In 19th International WWW Conference (WWW2010
"... The Web of Data has emerged as a way of exposing structured linked data on the Web. It builds on the central building blocks of the Web (URIs, HTTP) and benefits from its simplicity and wide-spread adoption. It does, however, also inherit the unresolved issues such as the broken link problem. Broken ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The Web of Data has emerged as a way of exposing structured linked data on the Web. It builds on the central building blocks of the Web (URIs, HTTP) and benefits from its simplicity and wide-spread adoption. It does, however, also inherit the unresolved issues such as the broken link problem. Broken links constitute a major challenge for actors consuming Linked Data as they require them to deal with reduced accessibility of data. We believe that the broken link problem is a major threat to the whole Web of Data idea and that both Linked Data consumers and providers will require solutions that deal with this problem. Since no general solutions for fixing such links in the Web of Data have emerged, we make three contributions into this direction: first, we provide a concise definition of the broken link problem and a comprehensive analysis of existing approaches. Second, we present DSNotify, a generic framework able to assist human and machine actors in fixing broken links. It uses heuristic feature comparison and employs a time-interval-based blocking technique for the underlying instance matching problem. Third, we derived benchmark datasets from knowledge bases such as DBpedia and evaluated the effectiveness of our approach with respect to the broken link problem. Our results show the feasibility of a time-interval-based blocking approach for systems that aim at detecting and fixing broken links in the Web of Data.

