Results 1 - 10
of
20
W.: A scalable approach for efficiently generating structured dataset topic profiles
, 2014
"... Abstract. The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propo ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of tech-niques for resource sampling from datasets, topic extraction from refer-ence datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate pro-files even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.
RDFS & OWL Reasoning for Linked Data
"... Abstract. Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning – using RDF Schema (RDFS) and the Web Ontology Language (OWL) – can help to obtain more comp ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning – using RDF Schema (RDFS) and the Web Ontology Language (OWL) – can help to obtain more complete answers for such queries over Linked Data. We first look at the extent to which RDFS and OWL features are being adopted on the Web. We then introduce two high-level architectures for query answering over Linked Data and outline how these can be enriched by (lightweight) RDFS and OWL reasoning, enumerating the main challenges faced and discussing reasoning methods that make practical and theoretical trade-offs to address these challenges. In the end, we also ask whether or not RDFS and OWL are enough and discuss numeric reasoning methods that are beyond the scope of these standards but that are often important when integrating Linked Data from several, heterogeneous sources. 1
Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?
"... Abstract Recent work analyzing changes on the Linked Open Data (LOD) cloud on fine-grained weekly snapshots shows that vocabularies published on the cloud are highly static. While this result is quite expected, there is another kind of schematic information that can be observed on the LOD cloud: the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract Recent work analyzing changes on the Linked Open Data (LOD) cloud on fine-grained weekly snapshots shows that vocabularies published on the cloud are highly static. While this result is quite expected, there is another kind of schematic information that can be observed on the LOD cloud: the use of the vocabularies in the cloud. With use, we mean the combinations of sets of properties and sets of types to describe the resources in a specific domain. Current literature does not tackle this question sufficiently. In order to gain insight into how the use of vocabularies on the LOD cloud changes over time, we present illustrating examples and a formalization of the research question. Subsequently, we present early results of experiments applied on weekly snapshots that show that the use of vocabularies indeed changes quite a lot over time. 1
Fedra: Query Processing for SPARQL Federations with Divergence
"... Abstract. Data replication and deployment of local SPARQL endpoints improve scalability and availability of public SPARQL endpoints, mak-ing the consumption of Linked Data a reality. This solution requires syn-chronization and specific query processing strategies to take advantage of replication. Ho ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Data replication and deployment of local SPARQL endpoints improve scalability and availability of public SPARQL endpoints, mak-ing the consumption of Linked Data a reality. This solution requires syn-chronization and specific query processing strategies to take advantage of replication. However, existing replication aware techniques in federa-tions of SPARQL endpoints do not consider data dynamicity. We propose Fedra, an approach for querying federations of endpoints that benefits from replication. Participants in Fedra federations can copy fragments of data from several datasets, and describe them using provenance and views. These descriptions enable Fedra to reduce the number of selected endpoints while satisfying user divergence requirements. Experiments on real-world datasets suggest savings of up to three orders of magnitude.
Link Traversal Querying for a Diverse Web of Data
, 2014
"... Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at answering SPARQL queries in such a manner; these link-traversal based query execution (LTBQE) approaches for Linked Data offer up-to-date results and decentralised (i.e., client-side) execution, but must operate over incomplete dereferenceable knowledge available in remote documents, thus affecting response times and “recall” for query answers. In this paper, we study the recall and effectiveness of LTBQE, in practice, for the Web of Data. Furthermore, to integrate data from diverse sources, we propose lightweight reasoning extensions to help find additional answers. From the state-of-the-art which (1) considers only dereferenceable information and (2) follows rdfs:seeAlso links, we propose extensions to consider (3) owl:sameAs links and reasoning, and (4) lightweight RDFS reasoning. We then estimate the recall of link-traversal query techniques in practice: we analyse a large crawl of the Web of Data (the BTC’11 dataset), looking at the ratio of raw data contained in dereferenceable documents vs. the corpus as a whole and determining how much more raw data our extensions make available for query answering. We then stress-test LTBQE (and our extensions) in real-world settings using the FedBench and DBpedia SPARQL Benchmark frameworks, and propose a novel benchmark called QWalk based on random
Types of Property Pairs and Alignment on Linked Datasets–APreliminary Analysis
"... Abstract. Dataset publication on the Web has been greatly influenced by the Linked Open Data (LOD) project. Many interlinked datasets have become freely available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these inter ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Dataset publication on the Web has been greatly influenced by the Linked Open Data (LOD) project. Many interlinked datasets have become freely available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected datasets have received a lot of attention in the recent past compared to properties. We identify three different categories of property pairs found in the alignment process and study their relative distribution among well known LOD datasets. We also provide comparative analysis of state-of-the-art techniques with regard to different categories, highlighting their capabilities. This could lead to more realistic and useful alignment of properties in LOD and similar datasets.
Generating Synthetic RDF Data with Connected Blank Nodes for Benchmarking
"... Abstract. Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the cur-rent generators do not support sufficiently (or totally ignore) blank node connectivity issues ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the cur-rent generators do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison, version-ing, synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evalua-tion of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGen, which allows building datasets containing blank nodes with the desired complexity, controllable through various features (morphology, size, di-ameter, density and clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator. 1
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
"... Abstract The Linked Open Data (LOD) cloud changes frequently. Recent ap-proaches focus mainly on quantifying the changes that occur in the LOD cloud by comparing two snapshots of a linked dataset captured at two different points in time. These change metrics are able to measure absolute changes betw ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract The Linked Open Data (LOD) cloud changes frequently. Recent ap-proaches focus mainly on quantifying the changes that occur in the LOD cloud by comparing two snapshots of a linked dataset captured at two different points in time. These change metrics are able to measure absolute changes between these two snapshots. However, they cannot determine the dynamics of a dataset over a period of time, i.e., the intensity of how the data evolved in this period. In this paper, we present a general framework to analyze the dynamics of linked datasets within a given time interval. We propose a function to measure the dynamics of a LOD dataset, which is defined as the aggregation of absolute, infinitesimal changes, provided by change metrics. Our method can be parametrized to in-corporate and make use of existing change metrics. Furthermore, our framework enables the use of different decay functions within the dynamics computation for different weights on changes depending on when they occurred in the observed time interval. We apply our framework to conduct an investigation on the dynam-ics of selected LOD datasets. We apply our analysis on a large-scale LOD dataset that is obtained from the LOD cloud by weekly crawls over more than a year. Fi-nally, we discuss the benefits and potential applications of our dynamics function in a real world scenario. 1
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
- In: PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data
, 2014
"... Abstract Knowledge about the distribution of data provides the basis for various tasks in the context of Linked Open Data, e.g. for estimating the result set size of a query, for the purpose of statistical schema induction or for using informa-tion theoretic metrics to detect patterns. In this paper ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Knowledge about the distribution of data provides the basis for various tasks in the context of Linked Open Data, e.g. for estimating the result set size of a query, for the purpose of statistical schema induction or for using informa-tion theoretic metrics to detect patterns. In this paper I investigate the potential of obtaining estimates for such distributions from samples of Linked Data. There-fore, I consider three sampling methods applicable to public RDF data on the Web as well as smoothing techniques to overcome the problem of unseen events in the sample space of a distribution. In a systematic empirical evaluation I look into the effects of these techniques on the quality of distributions approximated from samples. The obtained insights help to assess the quality impact of com-binations of sampling and smoothing techniques on five prototypical structures over which distributions are estimated. Furthermore, the results demonstrate the potential and the limitations of these techniques, motivating further work in the direction of sampling from Linked Open Data. 1
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
"... Abstract. We analyse the evolution of vocabularies on the Linked Open Data cloud. Based on the recent statistics of the LOD cloud, we have selected the twelve most dominant vocabularies in terms of their use in different pay-level domains. The number of versions we found for these vocabularies rang ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We analyse the evolution of vocabularies on the Linked Open Data cloud. Based on the recent statistics of the LOD cloud, we have selected the twelve most dominant vocabularies in terms of their use in different pay-level domains. The number of versions we found for these vocabularies range between 2 to 11. While some ontologies exist for more than 10 years (e.g., FOAF) others are only online since a few years (like DCAT). Our analysis shows that many changes occurred on annotation properties. This reflects a need for more clarification of the terms, especially at early versions of the vocabularies. The majority of changes in the vocabularies are due to changes in other, imported vocabularies. Thus, there is a co-evolution of different vocabularies. This insight has practical impacts to ontology engineers. They not only need to consider the evolution of the vocabularies they directly use, but also those they import and indirectly depend on.