Results 1 - 10
of
31
Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes ⋆
"... Abstract. RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, wherea ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems. 1
SemaPlorer -- Interactive Semantic Exploration of Data and Media based on a Federated Cloud Infrastructure
"... SemaPlorer is an easy to use application that allows end users to interactively explore and visualize a very large, mixed-quality and semantically heterogeneous distributed semantic data set in realtime. Its purpose is to acquaint oneself about a city, touristic area, or other area of interest. By ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
SemaPlorer is an easy to use application that allows end users to interactively explore and visualize a very large, mixed-quality and semantically heterogeneous distributed semantic data set in realtime. Its purpose is to acquaint oneself about a city, touristic area, or other area of interest. By visualizing the data using a map, media, and different context views, we clearly go beyond simple storage and retrieval of large numbers of triples. The interaction with the large data set is driven by the user. SemaPlorer leverages different semantic data sources such as DBpedia, GeoNames, WordNet, and personal FOAF files. These make a significant portion of the data provided for the billion triple challenge. It intriguingly connects with a large Flickr data set converted to RDF. SemaPlorer’s storage infrastructure bases on Amazon’s Elastic Computing Cloud (EC2) and Simple Storage Service. We apply NetworkedGraphs as additional layer on top of EC2, performing as a large, federated data infrastructure for semantically heterogeneous data sources from within and outside of the cloud. Therefore, the application is scalable with respect to the amount of distributed components working together as well as the number of triples managed overall. Hence, SemaPlorer is flexible enough to leverage for exploration almost arbitrary additional data sources that might be added in the future.
Ad-hoc Object Retrieval in the Web of Data
"... Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a s ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.
An algebra for basic graph patterns
- In Proc. of the Workshop on Logic in Databases (LID
, 2008
"... Abstract. Motivated by recent developments in the dataspaces, web, and personal information management communities, we outline research directions on query processing for SPARQL, the W3C recommendation language for querying RDF triple stores. The core of each SPARQL query is a basic graph pattern (B ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Motivated by recent developments in the dataspaces, web, and personal information management communities, we outline research directions on query processing for SPARQL, the W3C recommendation language for querying RDF triple stores. The core of each SPARQL query is a basic graph pattern (BGP). BGP is a little logic for extracting subsets of related nodes in an RDF graph. In this paper we undertake a formal study of BGP with an eye towards efficient SPARQL query evaluation. Our main contributions are (1) an algebraization of BGP, and (2) first steps towards a framework for the design of structural indexes to accelerate processing of queries in this algebra. 1
SWSE: Objects before documents!
"... Abstract. Web search engines are immensly useful for locating documents online. However, with more and more structured data being published online, the restriction to the hyperdocument model impairs the usefulness for searching and browsing. In contrast, an object-orientated model provides means to ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Web search engines are immensly useful for locating documents online. However, with more and more structured data being published online, the restriction to the hyperdocument model impairs the usefulness for searching and browsing. In contrast, an object-orientated model provides means to firstly integrate data about the same object from multiple sources, and secondly enable expressive queries over the integrated information space. We present SWSE, a search engine over 1.1 billion statements published on the Semantic Web. The system provides an easy-to-use end-user interface through which users can find and navigate an object-orientated information space. In addtion, the system exposes the data via a full SPARQL REST service which is open for application developers to query and integrate data in own applications. 1
Scalable Indexing of RDF Graphs for Efficient Join Processing ABSTRACT
"... Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose a Three-way Triple Tree (TripleT) secondary memory indexing technique to facilitate flexible and efficient join evaluation on RDF data. The novelty of TripleT is that the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and, the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state-of-the-art, in terms of both storage and query processing costs.
Signal/Collect: Graph Algorithms for the (Semantic) Web
"... Abstract. The Semantic Web graph is growing at an incredible pace, enabling opportunities to discover new knowledge by interlinking and analyzing previously unconnected data sets. This confronts researchers with a conundrum: Whilst the data is available the programming models that facilitate scalabi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The Semantic Web graph is growing at an incredible pace, enabling opportunities to discover new knowledge by interlinking and analyzing previously unconnected data sets. This confronts researchers with a conundrum: Whilst the data is available the programming models that facilitate scalability and the infrastructure to run various algorithms on the graph are missing. Some use MapReduce – a good solution for many problems. However, even some simple iterative graph algorithms do not map nicely to that programming model requiring programmers to shoehorn their problem to the MapReduce model. This paper presents the Signal/Collect programming model for synchronous and asynchronous graph algorithms. We demonstrate that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/Collect adaptations of various relevant algorithms. Furthermore, we built and evaluated a prototype Signal/Collect framework that executes algorithms in our programming model. We empirically show that this prototype transparently scales and that guiding computations by scoring as well as asynchronicity can greatly improve the convergence of some example algorithms. We released the framework under the Apache License 2.0 (at
A Node Indexing Scheme for Web Entity Retrieval
"... Abstract. Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrie ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an “entity retrieval system ” designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine. 1
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
"... Abstract—Subgraph matching is a key operation on graph data. Social network (SN) providers may want to find all subgraphs within their social network that “match ” certain query graph patterns. Unfortunately, subgraph matching is NP-complete, making its application to massive SNs a major challenge. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—Subgraph matching is a key operation on graph data. Social network (SN) providers may want to find all subgraphs within their social network that “match ” certain query graph patterns. Unfortunately, subgraph matching is NP-complete, making its application to massive SNs a major challenge. Past work has shown how to implement subgraph matching on a single processor when the graph has 10-25M edges. In this paper, we show how to use cloud computing in conjunction with such existing single processor methods to efficiently match complex subgraphs on graphs as large as 778M edges. A cloud consists of one “master ” compute node and k “slave ” compute nodes. We first develop a probabilistic method to estimate probabilities that a vertex will be retrieved by a random query and that a pair of vertices will be successively retrieved by a random query. We use these probability estimates to define edge weights in an SN and to compute minimal edge cuts to partition the graph amongst k slave nodes. We develop algorithms for both master and slave nodes that try to minimize communication overhead. The resulting COSI system can answer complex queries over real-world SN data containing over 778M edges very efficiently. I.
Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources
"... Abstract. In order to effectively and quickly answer queries in environments with distributed RDF/OWL, we present a query optimization algorithm to identify the potentially relevant Semantic Web data sources using structural query features and a term index. This algorithm is based on the observation ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In order to effectively and quickly answer queries in environments with distributed RDF/OWL, we present a query optimization algorithm to identify the potentially relevant Semantic Web data sources using structural query features and a term index. This algorithm is based on the observation that the join selectivity of a pair of query triple patterns is often higher than the overall selectivity of these two patterns treated independently. Given a rule goal tree that expresses the reformulation of a conjunctive query, our algorithm uses a bottom-up approach to estimate the selectivity of each node. It then prioritizes loading of selective nodes and uses the information from these sources to further constrain other nodes. Finally, we use an OWL reasoner to answer queries over the selected sources and their corresponding ontologies. We have evaluated our system using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.

