Results 1 -
9 of
9
Observing Linked Data Dynamics
- In ESWC
, 2013
"... Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of the two-hop neighbourhood. Having now run this experiment for six months, we analyse the dynamics of the monitored documents over the data collected thus far. We look at the estimated lifespan of the core documents, how often they go on-line or offline, how often they change; we further investigate domain-level trends. Next we look at changes within the RDF content of the core documents across the weekly snapshots, examining the elements (i.e., triples, subjects, predicates, objects, classes) that are most frequently added or removed. Thereafter, we look at how the links between dereferenceable documents evolves over time in the two-hop neighbourhood. 1
On the diversity and availability of temporal information in Linked Open Data
- In ISWC
, 2012
"... ..."
(Show Context)
Freshening up while staying fast: Towards hybrid SPARQL queries
- In EKAW
, 2012
"... Abstract. Querying over cached indexes of Linked Data often suffers from stale or missing results due to infrequent updates and partial coverage of sources. Conversely, live decentralised approaches offer fresh results directly from the Web, but exhibit slow response times due to accessing numerous ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Querying over cached indexes of Linked Data often suffers from stale or missing results due to infrequent updates and partial coverage of sources. Conversely, live decentralised approaches offer fresh results directly from the Web, but exhibit slow response times due to accessing numerous remote sources at runtime. We thus propose a hybrid query approach that improves upon both paradigms, offering fresher results from a broader range of sources than Linked Data caches while offering faster results than live querying. Our hybrid query engine takes a cached and live query engine as black boxes, where a hybrid query planner splits an input query and delegates the appropriate sub-queries to each interface. In this paper, we discuss query planning alternatives and their main strengths and weaknesses. We also present coherence measures to quantify the coverage and freshness for cached indexes of Linked Data, and show how these measures can be used for hybrid query planning to optimise the trade-off between fresh results and fast runtimes. 1
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
- In: PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data
, 2014
"... Abstract Knowledge about the distribution of data provides the basis for various tasks in the context of Linked Open Data, e.g. for estimating the result set size of a query, for the purpose of statistical schema induction or for using informa-tion theoretic metrics to detect patterns. In this paper ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Knowledge about the distribution of data provides the basis for various tasks in the context of Linked Open Data, e.g. for estimating the result set size of a query, for the purpose of statistical schema induction or for using informa-tion theoretic metrics to detect patterns. In this paper I investigate the potential of obtaining estimates for such distributions from samples of Linked Data. There-fore, I consider three sampling methods applicable to public RDF data on the Web as well as smoothing techniques to overcome the problem of unseen events in the sample space of a distribution. In a systematic empirical evaluation I look into the effects of these techniques on the quality of distributions approximated from samples. The obtained insights help to assess the quality impact of com-binations of sampling and smoothing techniques on five prototypical structures over which distributions are estimated. Furthermore, the results demonstrate the potential and the limitations of these techniques, motivating further work in the direction of sampling from Linked Open Data. 1
Paths towards the Sustainable Consumption of Semantic Data on the Web
"... Abstract. Based on recent results, we argue that the right method for Web clients to access relevant information from Linked Datasets has not yet been found. We propose that something is needed between (i) Linked Data dereferencing, which is simple and reliable but too vaguely defined; (ii) data dum ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Based on recent results, we argue that the right method for Web clients to access relevant information from Linked Datasets has not yet been found. We propose that something is needed between (i) Linked Data dereferencing, which is simple and reliable but too vaguely defined; (ii) data dumps, which are simple and reliable but too coarse-grained, and (iii) SPARQL querying, which is powerful and fine-grained but too unreliable. We argue that new protocols and query languages need to be investigated and define eight desiderata that an access method should meet in order to be considered sustainable for a mature Web of Data. A reader familiar with the Linked Data literature will already know about the hundreds of Linked Datasets published on the Web: they are mentioned in the introduction to almost every research paper written in the area. They will likewise be familiar with the ubiquitous LOD Cloud, often cited as the realisation of a new emerging “Web of Data”. They may even have read about hundreds of public SPARQL endpoints on the Web waiting to be queried or to be federated.1
Requirements on Linked Data Consumption Platform
"... ABSTRACT The publication of data as Linked Open Data (LOD) gains traction. There are lots of different datasets published, more vocabularies are becoming W3C Recommendations and with the introduction of DCAT-AP v1.1 and the emergence of the European data portal and a multitude of national open data ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT The publication of data as Linked Open Data (LOD) gains traction. There are lots of different datasets published, more vocabularies are becoming W3C Recommendations and with the introduction of DCAT-AP v1.1 and the emergence of the European data portal and a multitude of national open data portals, lots of datasets are discoverable and accessible using their DCAT-AP metadata in RDF. Yet, the consumption of LOD is lacking in comfort and availability of tools that would exploit the benefits of LOD and allow users to discover, access, integrate and reuse LOD easily, as promised by the promoters of LOD and supposedly paid by the additional effort put into the 5-star data publication by the publishers. Compared to the consumption of 3-star CSV and XML files, the consumption of LOD is still quite complicated and the LOD benefits are not exploited enough nor visible enough to justify the effort for many publishers. In this paper we identify 40 requirements which a Linked Data Consumption Platform (LDCP) should satisfy in order to be able to exploit the LOD benefits in a way that would ease the LOD consumption and justify the additional effort put into LOD publication. We survey 8 relevant and currently available tools based on their coverage of the identified requirements.
Exploring the Dynamics of Linked Data ⋆
"... Abstract. Little is known about the dynamics of Linked Data, primarily because there have been few, if any, suitable collections of data made available for analysis of how Linked Data documents evolve over time. We aim to address this issue. We propose the Dynamic Linked Data Observatory, which prov ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Little is known about the dynamics of Linked Data, primarily because there have been few, if any, suitable collections of data made available for analysis of how Linked Data documents evolve over time. We aim to address this issue. We propose the Dynamic Linked Data Observatory, which provides the community with such a collection, monitoring a fixed set of Linked Data documents at weekly intervals. We have now collected eight months of raw data comprising weekly snapshots of eighty thousand Linked Data documents. Having published results characterising the high-level dynamics of Linked Data, we now wish to disseminate results: we wish to investigate how results from our experiment might benefit the community and what online services and statistics (relating to Linked Data dynamics) would be most useful for us to provide. Summary The Web of (Linked) Data is dynamic. Knowledge about Linked Data dynamics is important for a wide range of applications and can help to answer a wide
Everything YouAlwaysWanted to KnowAbout BlankNodes ∗
"... In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and somet ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC–2012 dataset), where we find that 25.7 % of unique RDF terms are blank nodes, that 44.9 % of documents and 66.2 % of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many “blank node cycles”, the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1 % of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently “rich ” in ground information for problematic cases to be avoided by non-naive algorithms.
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query Response Requirements Using Materialization
"... Abstract. To integrate various Linked Datasets, the data warehousing and the live query processing approaches provide two extremes for the optimized response time and quality respectively. The first approach pro-vides very fast responses but su↵ers from providing low-quality responses because change ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. To integrate various Linked Datasets, the data warehousing and the live query processing approaches provide two extremes for the optimized response time and quality respectively. The first approach pro-vides very fast responses but su↵ers from providing low-quality responses because changes of original data are not immediately reflected on ma-terialized data. The second approach provides accurate responses but it is notorious for long response times. A hybrid SPARQL query processor provides a middle ground between two specified extremes by splitting triple patterns of the SPARQL query between live and local processors based on a predetermined coherence threshold specified by the admin-istrator. However, considering quality requirements while splitting the SPARQL query, enables the processor to eliminate the unnecessary live execution and releases resources for other queries and is the main focus of my work. This requires estimating quality of the response provided with the current materialized data, compare it with user requirements and determine the most selective sub-queries which can boost the response quality up to the specified level with least computational complexity. In this work, we discuss the preliminary result for estimating the fresh-ness of materialized data, as one dimension of the quality, by extending cardinality estimation techniques and explain the future plan.