Results 1 - 10
of
22
Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct
"... The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of pro ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions. 1.
An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario
- In Proceedings of the 7th International Semantic Web Conference (ISWC
, 2008
"... Abstract. Efficient RDF data management is one of the cornerstones in realizing the Semantic Web vision. In the past, different RDF storage strategies have been proposed, ranging from simple triple stores to more advanced techniques like clustering or vertical partitioning on the predicates. We pres ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. Efficient RDF data management is one of the cornerstones in realizing the Semantic Web vision. In the past, different RDF storage strategies have been proposed, ranging from simple triple stores to more advanced techniques like clustering or vertical partitioning on the predicates. We present an experimental comparison of existing storage strategies on top of the SP 2 Bench SPARQL performance benchmark suite and put the results into context by comparing them to a purely relational model of the benchmark scenario. We observe that (1) in terms of performance and scalability, a simple triple store built on top of a column-store DBMS is competitive to the vertically partitioned approach when choosing a physical (predicate, subject, object) sort order, (2) in our scenario with real-world queries, none of the approaches scales to documents containing tens of millions of RDF triples, and (3) none of the approaches can compete with a purely relational model. We conclude that future research is necessary to further bring forward RDF data management. 1
Improving the Performance of Semantic Web Applications with SPARQL Query Result Caching
"... Abstract. The performance of triple stores is one of the major obstacles for the deployment of semantic technologies in many usage scenarios. In particular Semantic Web applications, which use triple stores as persistence backends, trade performance in for the advantage of flexibility with regard to ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. The performance of triple stores is one of the major obstacles for the deployment of semantic technologies in many usage scenarios. In particular Semantic Web applications, which use triple stores as persistence backends, trade performance in for the advantage of flexibility with regard to information structuring. In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information which updates will change the query result. We evaluated our approach by extending the BSBM triple store benchmark with an update dimension as well as in typical Semantic Web application scenarios. 1
RDF on Cloud Number Nine
"... Abstract. We examine whether the existing ’Database in the Cloud’ service SimpleDB can be used as a back end to quickly and reliably store RDF data for massive parallel access. Towards this end we have implemented ’Stratustore’, an RDF store which acts as a back end for the Jena Semantic Web framewo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We examine whether the existing ’Database in the Cloud’ service SimpleDB can be used as a back end to quickly and reliably store RDF data for massive parallel access. Towards this end we have implemented ’Stratustore’, an RDF store which acts as a back end for the Jena Semantic Web framework and stores its data within the SimpleDB. We used the Berlin SPARQL Benchmark to evaluate our solution and compare it to state of the art triple stores. Our results show that for certain simple queries and many parallel accesses such a solution can have a higher throughput than state of the art triple stores. However, due to the very limited expressiveness of SimpleDB’s query language, more complex queries run multiple orders of magnitude slower than the state of the art and would require special indexes. Our results point to the need for more complex database services as well as the need for robust, possible query dependent index techniques for RDF. 1
View Selection in Semantic Web Databases ∗
"... We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of que ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments. 1.
Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources
"... Abstract. In order to effectively and quickly answer queries in environments with distributed RDF/OWL, we present a query optimization algorithm to identify the potentially relevant Semantic Web data sources using structural query features and a term index. This algorithm is based on the observation ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In order to effectively and quickly answer queries in environments with distributed RDF/OWL, we present a query optimization algorithm to identify the potentially relevant Semantic Web data sources using structural query features and a term index. This algorithm is based on the observation that the join selectivity of a pair of query triple patterns is often higher than the overall selectivity of these two patterns treated independently. Given a rule goal tree that expresses the reformulation of a conjunctive query, our algorithm uses a bottom-up approach to estimate the selectivity of each node. It then prioritizes loading of selective nodes and uses the information from these sources to further constrain other nodes. Finally, we use an OWL reasoner to answer queries over the selected sources and their corresponding ontologies. We have evaluated our system using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.
Relational Processing of RDF Queries: A Survey
"... The Resource Description Framework (RDF) is a flexible model for representing information about resources in the web. With the increasing amount of RDF data which is becoming available, efficient and scalable management of RDF data has become a fundamental challenge to achieve the Semantic Web visio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Resource Description Framework (RDF) is a flexible model for representing information about resources in the web. With the increasing amount of RDF data which is becoming available, efficient and scalable management of RDF data has become a fundamental challenge to achieve the Semantic Web vision. The RDF model has attracted the attention of the database community and many researchers have proposed different solutions to store and query RDF data efficiently. This survey focuses on using relational query processors to store and query RDF data. We provide an overview of the different approaches and classify them according to their storage and query evaluation strategies. 1.
Ultrawrap: SPARQL Execution on Relational Data
"... Abstract: The Semantic Web’s promise to achieve web-wide data integration requires the inclusion of legacy relational data as RDF, which, in turn, requires the execution of SPARQL queries on the legacy relational database. In this paper we explore a hypothesis: existing commercial relational databas ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: The Semantic Web’s promise to achieve web-wide data integration requires the inclusion of legacy relational data as RDF, which, in turn, requires the execution of SPARQL queries on the legacy relational database. In this paper we explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment, embodied in a system called Ultrawrap, comprises encoding a logical representation of the database as a graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course executing a SPARQL query, the SQL optimizer both instantiates a mapping of relational data to RDF and optimizes its execution. Other approaches typically implement aspects of query optimization and execution outside the SQL environment. Ultrawrap is evaluated using two benchmarks across the three major relational database management systems. We identify two important optimizations: detection of unsatisfiable conditions and self-join elimination, such that, when applied, SPARQL queries execute at nearly the same speed as semantically equivalent native SQL queries, providing strong evidence of the validity of the hypothesis. 1.
A Confluence of Column Stores and Search Engines: Opportunities and Challenges
"... IR and DB integration has been a long-withstanding research challenge. Most of the work trying to integrate the two fields is motivated by specific application scenarios. In this paper we approach this problem from another perspective. Instead of focusing on IR and DB as whole fields, we restrict th ..."
Abstract
- Add to MetaCart
IR and DB integration has been a long-withstanding research challenge. Most of the work trying to integrate the two fields is motivated by specific application scenarios. In this paper we approach this problem from another perspective. Instead of focusing on IR and DB as whole fields, we restrict the focus to search engines and column stores. We present observations of similarities in the two technologies, and aggregate information on parallel developments in the two fields. We argue that these developments point towards a confluence of column stores and search engines, and one may in fact argue that this confluence has already started. We evaluate the potential in developing an engine capable of handling the workloads traditionally supported by the different systems, namely decision support and search workloads, by identifying potential opportunities and challenges. The opportunities include potential areas for technology transfer and more efficient support for features. The identified challenges outline areas for future work whose successfulness will help decide whether a confluence of column stores and search engines is feasible. 1.
Age
"... Joint work with Johannes Gehrke and Øystein Torbjørnsen Work done while visiting Cornell UniversityMotivation Our paper investigates the technical similarities and differences between column stores and search engines: ◮ Both column stores and inverted indexes in search engines are column-oriented. ..."
Abstract
- Add to MetaCart
Joint work with Johannes Gehrke and Øystein Torbjørnsen Work done while visiting Cornell UniversityMotivation Our paper investigates the technical similarities and differences between column stores and search engines: ◮ Both column stores and inverted indexes in search engines are column-oriented.

