Results 1 - 10
of
130
Linked Data -- The story so far
"... The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertion ..."
Abstract
-
Cited by 739 (15 self)
- Add to MetaCart
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions- the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Executing SPARQL queries over the web of linked data
, 2009
"... Abstract. The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research ..."
Abstract
-
Cited by 104 (10 self)
- Add to MetaCart
(Show Context)
Abstract. The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges. 1
Data Summaries for On-Demand Queries over Linked Data
- In: Proceedings of the 19th International World Wide Web Conference
, 2010
"... Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data – where structured data ..."
Abstract
-
Cited by 84 (15 self)
- Add to MetaCart
(Show Context)
Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data – where structured data is accessible live and up-to-date at distributed Web resources that may change constantly – only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on the Web exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.
Provenance Information in the Web of Data
, 2009
"... The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trustworthiness of the data. A common approach for data quality assessment is the analysis of provenance information. For this reason, ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trustworthiness of the data. A common approach for data quality assessment is the analysis of provenance information. For this reason, this paper discusses provenance of data on the Web and proposes a suitable provenance model. While traditional provenance research usually addresses the creation of data, our provenance model also represents data access, a dimension of provenance that is particularly relevant in the context of Web data. Based on our model we identify options to obtain provenance information and we raise open questions concerning the publication of provenance-related metadata for linked data on the Web.
Searching and Browsing Linked Data with SWSE: the Semantic Web Search Engine
, 2011
"... In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional sea ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data – loosely also known as Linked Data – which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web – in terms of scale, unreliability, inconsistency and noise – are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a besteffort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
Sig.ma: Live views on the Web of data
- In Proc. WWW-2010
, 2010
"... Abstract. We demonstrate Sig.ma, both a service and an end user application to browse and perform tasks leveraging data coming from dozens of distributed and unrelated sources on the Web of Data. 1 ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We demonstrate Sig.ma, both a service and an end user application to browse and perform tasks leveraging data coming from dozens of distributed and unrelated sources on the Web of Data. 1
Ad-hoc object retrieval in the web of data.
- In Proceedings of the 19th international conference on World wide web,
, 2010
"... ABSTRACT Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task pr ..."
Abstract
-
Cited by 42 (8 self)
- Add to MetaCart
(Show Context)
ABSTRACT Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.
Mind the data skew: Distributed inferencing by speeddating in elastic regions
- In Proc. of the WWW
, 2010
"... Semantic Web data exhibits very skewed frequency distributions among terms. Efficient large-scale distributed reasoning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reasoning approach ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
(Show Context)
Semantic Web data exhibits very skewed frequency distributions among terms. Efficient large-scale distributed reasoning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reasoning approaches, has limited scalability due to load-balancing problems. We address this problem with a method for data distribution based on clustering in elastic regions. Instead of assigning data to fixed peers, data flows semi-randomly in the network. Data items “speed-date ” while being temporarily collocated in the same peer. We introduce a bias in the routing to allow semantically clustered neighborhoods to emerge. Our approach is self-organising, efficient and does not require any central coordination. We have implemented this method on the MaRVIN platform and have performed experiments on large real-world datasets, using a cluster of up to 64 nodes. We compute the RDFS closure over different datasets and show that our clustering algorithm drastically reduces computation time, calculating the RDFS closure of 200 million triples in 7.2 minutes.
P.: Hermes: Data web search on a pay-as-you-go integration infrastructure. Web semantics
- Science, Services and Agents on the World Wide Web 7, 189–203 (2009) ha l-0 2, v er sio n - 1 Ja n
"... karlsruhe.de The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer lim-ited to matching keywords against documents, but instead complex information needs can be expre ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
karlsruhe.de The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer lim-ited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with pre-cise answers as results. In this paper, we present SearchWebDB, an infrastructure for data web search that addresses a number of challenges involved in realizing search on the data web: To pro-vide an end-user oriented interface, we support expressive keyword search by translating user information needs into structured queries. We integrate heterogeneous web data sources with automatically computed mappings. Schema-level mappings are exploited in con-structing structured queries against the integrated schema. These structured queries are decomposed into queries against the local web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data web we show the practicability and scalability of the SearchWebDB infrastructure.
Context dependent reasoning for semantic documents in Sindice
- In 4th Int.’l Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2008
"... Abstract. The Sindice Semantic Web index provides search capabilities over today more than 30 million documents. A scalable reasoning mechanism for real-world web data is important in order to increase the precision and recall of the Sindice index by inferring useful information (e.g. RDF Schema fea ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
(Show Context)
Abstract. The Sindice Semantic Web index provides search capabilities over today more than 30 million documents. A scalable reasoning mechanism for real-world web data is important in order to increase the precision and recall of the Sindice index by inferring useful information (e.g. RDF Schema features, equality, property characteristic such as inverse functional properties or annotation properties from OWL). In this paper, we introduce our notion of context dependent reasoning for RDF documents published on the Web according to the linked data principle. We then illustrate an efficient methodology to perform context dependent RDFS and partial OWL inference based on a persistent TBox composed of a network of web ontologies. Finally we report preliminary evaluation results of our implementation underlying the Sindice web data index. 1