DMCA
Executing Queries over Schemaless RDF Databases
Citations
1938 | Data clustering: A review
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ... hierarchical clustering works is aligned with our clustering objectives as clusters are merged one pair at a time until a global objective is achieved. This is not true for centroid-based clustering =-=[34]-=- or spectral clustering [34]. Second, other algorithms such as k-means [34] require the final number of clusters to be known in advance, which is not possible in our case. As an assumption that genera... |
738 | Linked data - the story so far - Bizer, Heath, et al. |
558 | Answering queries using views: A survey
- Halevy
(Show Context)
Citation Context ...al schema to describe the group-by-query representation that can be used to efficiently generate valid query plans, which relational systems rely on heavily for query plan generation and optimization =-=[21]-=-. Without addressing these two challenges, query plan generation and optimization can easily become a bottleneck. Consequently, in addition to introducing the group-byquery representation, we make the... |
418 |
Resource description framework (RDF): Concepts and abstract syntax, February 2004. Available from: http://www.w3.org/TR/rdf-concepts/ [cited 28
- KLYNE, CARROLL
- 2006
(Show Context)
Citation Context ...gorithm over workload-oblivious techniques employed by other RDF data management systems (Section VI). II. RELATED WORK RDF is composed of subject-predicate-object (s, p, o) statements called triples =-=[22]-=-. Each triple describes an aspect of a web resource. The subject of the triple denotes the resource that is described, the predicate denotes a feature of that resource, and the object stores the value... |
345 | An algorithm for subgraphisomorphism
- Ullmann
- 1976
(Show Context)
Citation Context ...use graphs to represent both RDF data and the conjunctive fragment of SPARQL queries [28], which is called basic graph patterns (BGPs); and we model query evaluation as a subgraph isomorphism problem =-=[29]-=-. Therefore, for the most part, we rely on the standard formalization of SPARQL [30], and introduce only the concepts necessary to capture subgraph isomorphism as it is used in evaluating BGPs over RD... |
277 | C.: Semantics and complexity of SPARQL
- Pérez, Arenas, et al.
- 2006
(Show Context)
Citation Context ... [28], which is called basic graph patterns (BGPs); and we model query evaluation as a subgraph isomorphism problem [29]. Therefore, for the most part, we rely on the standard formalization of SPARQL =-=[30]-=-, and introduce only the concepts necessary to capture subgraph isomorphism as it is used in evaluating BGPs over RDF graphs. In an extended version of our paper [31], we prove the equivalence between... |
261 | Jena: implementing the semantic web recommendations
- Carroll, Dickinson, et al.
- 2004
(Show Context)
Citation Context ...sets such as those in the Linked Open Data (LOD) cloud [1], the demand for high-performance RDF data management systems is increasing. While multiple RDF data management approaches have been proposed =-=[2]-=-–[10], systems are still unable to achieve consistently good performance [11]. A major problem is that workloads that these systems service are becoming far more diverse [12]–[14] and far more dynamic... |
188 | Hexastore: sextuple indexing for semantic web data management
- Weiss, Karras, et al.
- 2008
(Show Context)
Citation Context ...ributes: s, p and o [2]. As a slight variation of this representation, another option is to maintain multiple copies of the table, where each table has an index that implements a different sort-order =-=[4]-=-, [7], [23]. It has also been argued that for different workloads, grouping data can provide performance benefits [3], [9], [24]. Therefore, two other representations were developed: (i) grouping by p... |
117 |
The RDF-3X engine for scalable management of RDF data.
- Neumann, Weikum
- 2010
(Show Context)
Citation Context ...sues in RDF data management systems [11]. We generated 100 million RDF triples using the WatDiv data generator and measured the performance of five popular RDF data management systems, namely, RDF-3x =-=[7]-=-, MonetDB [17], 4Store [18] and Virtuoso Open Source (VOS) versions 6.1 [19] and 7.1 [8]. In our evaluations, we used the WatDiv stress testing tool to generate a diverse workload of 12500 unique SPAR... |
108 | Optimized index structures for querying rdf from the web.
- Harth, Decker
- 2005
(Show Context)
Citation Context ..., p and o [2]. As a slight variation of this representation, another option is to maintain multiple copies of the table, where each table has an index that implements a different sort-order [4], [7], =-=[23]-=-. It has also been argued that for different workloads, grouping data can provide performance benefits [3], [9], [24]. Therefore, two other representations were developed: (i) grouping by predicates, ... |
84 | Scalable join processing on very large RDF graphs
- Neumann, Weikum
- 2009
(Show Context)
Citation Context ...tween relevant and irrelevant triples, which generates unnecessary intermediate result tuples (cf., Fig. 3e– 3i). In this case, reordering the join operations or applying sideways information passing =-=[33]-=- to early-prune some of the tuples in T1–T3 would not eliminate the problem. For example, while tuples (v1, v5) and (v4, v11) can be eliminated from T2 as soon as v2 is identified as the only join val... |
78 | A study on tolerable waiting time: how long are web users willing to wait?
- Nah
- 2004
(Show Context)
Citation Context ...act, Table I illustrates that none of the systems that we benchmarked have amortized (i.e., per query) execution times of less than six seconds, which is unacceptable for interactive web applications =-=[20]-=-. In earlier work, we identified 2 reasons why existing systems run into the aforementioned problems [16]: First, these systems rely on a fixed, workload-oblivious physical representation. Second, non... |
76 | SPARQL basic graph pattern optimization using selectivity estimation.
- Stocker, Seaborne, et al.
- 2008
(Show Context)
Citation Context ...ed in Fig. 5. Consider a baseline SE expression: Q1bσQ1(P)c ./ Q2bσQ2(P)c ./ Q3bσQ3(P)c (Fig. 5a). First, the joins in the baseline expression are reordered according to their estimated selectivities =-=[36]-=-. Second, by using generic equivalence rules, the expression is transformed into a canonical form. An SE expression is in canonical form if it consists of the union of a set of sub-expressions T1∪· · ... |
73 |
Column-store support for RDF data management: not all swans are white.
- Sidirourgos, Goncalves, et al.
- 2008
(Show Context)
Citation Context ...le, where each table has an index that implements a different sort-order [4], [7], [23]. It has also been argued that for different workloads, grouping data can provide performance benefits [3], [9], =-=[24]-=-. Therefore, two other representations were developed: (i) grouping by predicates, where the RDF database is partitioned into 2-column tables (one table per predicate) with the tables being stored in ... |
72 | SW-Store: A vertically partitioned DBMS for semantic web data management.
- Abadi, Marcus, et al.
- 2009
(Show Context)
Citation Context ...o other representations were developed: (i) grouping by predicates, where the RDF database is partitioned into 2-column tables (one table per predicate) with the tables being stored in a column-store =-=[5]-=-; and (ii) grouping by entities, where implicit relationships within the data are determined (either as a manual or automated process) to compute a relational schema, and data are mapped to an instant... |
71 | Scalable SPARQL querying of large RDF graphs.
- Huang, Abadi, et al.
- 2011
(Show Context)
Citation Context ...mated process) to compute a relational schema, and data are mapped to an instantiation of this schema [3], [9]. Another alternative is to rely on the native graph structure of the RDF data [6], [10], =-=[25]-=-, [26]. In this case, grouping by graph vertices, whereby edges in the RDF graph are grouped based on their incidence on a vertex, is a feasible representation. RDF data management systems, whether si... |
65 |
A.: Sparql query language for rdf (http://www.w3.org/tr/rdf-sparql-query
- Prudhommeaux, Seaborne
- 2007
(Show Context)
Citation Context ... workload-driven group-by-query representation [16] (Fig. 2a). III. BACKGROUND AND PRELIMINARIES In this paper, we use graphs to represent both RDF data and the conjunctive fragment of SPARQL queries =-=[28]-=-, which is called basic graph patterns (BGPs); and we model query evaluation as a subgraph isomorphism problem [29]. Therefore, for the most part, we rely on the standard formalization of SPARQL [30],... |
53 | N.: 4store: The design and implementation of a clustered RDF store.
- Harris, Lamb, et al.
- 2009
(Show Context)
Citation Context ...t systems [11]. We generated 100 million RDF triples using the WatDiv data generator and measured the performance of five popular RDF data management systems, namely, RDF-3x [7], MonetDB [17], 4Store =-=[18]-=- and Virtuoso Open Source (VOS) versions 6.1 [19] and 7.1 [8]. In our evaluations, we used the WatDiv stress testing tool to generate a diverse workload of 12500 unique SPARQL queries. Our observation... |
52 | Foundations of SPARQL Query Optimization
- Schmidt, Meier, et al.
(Show Context)
Citation Context ...to capture subgraph isomorphism as it is used in evaluating BGPs over RDF graphs. In an extended version of our paper [31], we prove the equivalence between the standard formalization of SPARQL [30], =-=[32]-=-, and our framework. Assume two disjoint, countably infinite sets U (URIs) and L (literals) (we ignore blank nodes in our discussions). URIs uniquely denote Web resources or features of Web resources.... |
35 | Self-organizing Tuple Reconstruction in Column-Stores. In
- Idreos, Kersten, et al.
- 2009
(Show Context)
Citation Context ...ne the clusters irrelevant to a query, we employ another index, called the cluster index (cf., Fig. 2b). The cluster index is also constructed in a lazy fashion, which is similar to database cracking =-=[37]-=-. Before any query is evaluated, the cluster index consists of a doubly-linked list of pointers to all of the clusters. Initially, the index does not assume anything about the contents within each clu... |
33 |
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets.
- Duan, Kementsietsidis, et al.
- 2011
(Show Context)
Citation Context ...es have been proposed [2]–[10], systems are still unable to achieve consistently good performance [11]. A major problem is that workloads that these systems service are becoming far more diverse [12]–=-=[14]-=- and far more dynamic [15] than what the systems have been designed to support [16]. To make matters worse, this deficiency has not been revealed in performance studies because benchmark workloads do ... |
32 |
la Fuente. An empirical study of real-world SPARQL queries
- Arias, Fernández, et al.
(Show Context)
Citation Context ...roaches have been proposed [2]–[10], systems are still unable to achieve consistently good performance [11]. A major problem is that workloads that these systems service are becoming far more diverse =-=[12]-=-–[14] and far more dynamic [15] than what the systems have been designed to support [16]. To make matters worse, this deficiency has not been revealed in performance studies because benchmark workload... |
31 | MonetDB: Two decades of research in column-oriented database architectures
- Idreos, Groffen, et al.
(Show Context)
Citation Context ...representations. Therefore, we compare chameleon-db (CDB) [31], which is our prototype implementation of the group-by-query clustering approach, with five popular systems, namely, RDF-3x [7], MonetDB =-=[17]-=-, 4Store [18] and Virtuoso Open Source (VOS) versions 6.1 [19] and 7.1 [8]. RDF-3x follows the single-table approach and creates multiple indexes; MonetDB is a column-store, where RDF data are represe... |
26 | Querying semantic web data with SPARQL.
- Arenas, Perez
- 2011
(Show Context)
Citation Context ...Pm represent sets of RDF graphs). i ∈ {1, . . . ,m}). Table II lists the equivalence rules. Rules 1– 3 are specific to the clustered-match operation, whereas rules 4–9 are derived from SPARQL algebra =-=[35]-=-. Rules that are marked with an asterisk (*) are conditional. Observe that the expansion rule relies on a condition that is independent of the way the graph is clustered. In other words, for any clust... |
22 | Dbpedia sparql benchmark: Performance assessment with real queries on real data.
- Morsey, Lehmann, et al.
- 2011
(Show Context)
Citation Context ...er the latter problem.3 Consequently, techniques can be developed as part of future work to decide how clusters should be serialized. We repeat our evaluations also on a crawl of the DBpedia dataset4 =-=[38]-=-. We focus on BGPs in this paper, therefore, we utilize the query logs provided by the benchmark to extract 14 BGP templates5. Note that most queries in the DBpedia query logs are repetitive and a lar... |
20 |
Building an efficient RDF store over a relational database.
- Bornea, Dolby, et al.
- 2013
(Show Context)
Citation Context ...e table, where each table has an index that implements a different sort-order [4], [7], [23]. It has also been argued that for different workloads, grouping data can provide performance benefits [3], =-=[9]-=-, [24]. Therefore, two other representations were developed: (i) grouping by predicates, where the RDF database is partitioned into 2-column tables (one table per predicate) with the tables being stor... |
14 | DOGMA: A disk-oriented graph matching algorithm for RDF databases.
- Brocheler, Pugliese, et al.
- 2009
(Show Context)
Citation Context ...ual or automated process) to compute a relational schema, and data are mapped to an instantiation of this schema [3], [9]. Another alternative is to rely on the native graph structure of the RDF data =-=[6]-=-, [10], [25], [26]. In this case, grouping by graph vertices, whereby edges in the RDF graph are grouped based on their incidence on a vertex, is a feasible representation. RDF data management systems... |
11 | Diversified stress testing of RDF data management systems.
- Aluç, Hartig, et al.
- 2014
(Show Context)
Citation Context ...r our evaluations, we primarily use the Waterloo SPARQL Diversity Test Suite (WatDiv) because it facilitates the generation of test cases that are far more diverse than any of the existing benchmarks =-=[11]-=-. In this regard, we use the WatDiv data generator to create two datasets: one with 10 million RDF triples and another with 100 million RDF triples (we observe that systems under test (SUT) load data ... |
7 | Web of linked data. A global public data space on the Web
- Bizer
- 2010
(Show Context)
Citation Context ...thout being hindered by the lack of a fixed physical schema. I. INTRODUCTION With the proliferation of very large, web-scale distributed RDF datasets such as those in the Linked Open Data (LOD) cloud =-=[1]-=-, the demand for high-performance RDF data management systems is increasing. While multiple RDF data management approaches have been proposed [2]–[10], systems are still unable to achieve consistently... |
6 | a Hybrid RDBMS/Graph Column Store - Erling, “Virtuoso - 2012 |
6 | From linked data to relevant data – time is the essence
- Kirchberg, Ko, et al.
(Show Context)
Citation Context ...[10], systems are still unable to achieve consistently good performance [11]. A major problem is that workloads that these systems service are becoming far more diverse [12]–[14] and far more dynamic =-=[15]-=- than what the systems have been designed to support [16]. To make matters worse, this deficiency has not been revealed in performance studies because benchmark workloads do not truly capture this div... |
4 | gstore: a graph-based sparql query engine,”
- Zou, Ozsu, et al.
- 2014
(Show Context)
Citation Context ... such as those in the Linked Open Data (LOD) cloud [1], the demand for high-performance RDF data management systems is increasing. While multiple RDF data management approaches have been proposed [2]–=-=[10]-=-, systems are still unable to achieve consistently good performance [11]. A major problem is that workloads that these systems service are becoming far more diverse [12]–[14] and far more dynamic [15]... |
4 | Workload Matters: Why RDF Databases Need a New Design.
- Aluc, Ozsu, et al.
- 2014
(Show Context)
Citation Context ...od performance [11]. A major problem is that workloads that these systems service are becoming far more diverse [12]–[14] and far more dynamic [15] than what the systems have been designed to support =-=[16]-=-. To make matters worse, this deficiency has not been revealed in performance studies because benchmark workloads do not truly capture this diversity and dynamism [11]. To demonstrate the issue, we co... |
3 |
WARP: Workload-Aware Replication and Partitioning for RDF
- Hose, Schenkel
- 2013
(Show Context)
Citation Context ...process) to compute a relational schema, and data are mapped to an instantiation of this schema [3], [9]. Another alternative is to rely on the native graph structure of the RDF data [6], [10], [25], =-=[26]-=-. In this case, grouping by graph vertices, whereby edges in the RDF graph are grouped based on their incidence on a vertex, is a feasible representation. RDF data management systems, whether single n... |
3 | RDF in the clouds: a survey
- Kaoudi, Manolescu
- 2015
(Show Context)
Citation Context ...raph vertices, whereby edges in the RDF graph are grouped based on their incidence on a vertex, is a feasible representation. RDF data management systems, whether single node [2]– [10] or distributed =-=[27]-=-, rely on one of the above physical representations. Our studies have demonstrated that with increasing diversity and dynamism in SPARQL workloads, all of these existing physical representations run i... |
3 | chameleon-db: a workloadaware robust rdf data management system
- Aluç, Ozsu, et al.
- 2013
(Show Context)
Citation Context ... standard formalization of SPARQL [30], and introduce only the concepts necessary to capture subgraph isomorphism as it is used in evaluating BGPs over RDF graphs. In an extended version of our paper =-=[31]-=-, we prove the equivalence between the standard formalization of SPARQL [30], [32], and our framework. Assume two disjoint, countably infinite sets U (URIs) and L (literals) (we ignore blank nodes in ... |
1 |
Jena property table implementation,” HP-Labs
- Wilkinson
- 2006
(Show Context)
Citation Context ...of the table, where each table has an index that implements a different sort-order [4], [7], [23]. It has also been argued that for different workloads, grouping data can provide performance benefits =-=[3]-=-, [9], [24]. Therefore, two other representations were developed: (i) grouping by predicates, where the RDF database is partitioned into 2-column tables (one table per predicate) with the tables being... |