Results 1 - 10
of
15
YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia
- Commun. ACM
"... We are grateful for input from various people’s work: Edwin Lewis-Kelham for implementing the YAGO2 user interface, Gerard de Melo for his help on integrating his Universal WordNet, and Erdal Kuzey for his work on named events and time facts in Wikipedia. We would also like to thank the people who h ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We are grateful for input from various people’s work: Edwin Lewis-Kelham for implementing the YAGO2 user interface, Gerard de Melo for his help on integrating his Universal WordNet, and Erdal Kuzey for his work on named events and time facts in Wikipedia. We would also like to thank the people who helped evaluate the quality of YAGO2 by manual assessment, most notably, Ndapandula Nakashole, Stephan Seufert, Erdal Kuzey, and We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 80 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95 % of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple
x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases ABSTRACT
"... The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also require online updates with flexible control over multi-user isolation levels and data consistency. The challenge lies in meeting these requirements while retaining the capability for fast querying. This paper presents a comprehensive solution that is based on an extended deferred-indexing method with integrated versioning. The version store enables time-travel queries that are efficiently processed without adversely affecting queries on the current data. For flexible consistency, transactional concurrency control is provided with options for either snapshot isolation or full serializability. All methods are integrated in an extension of the RDF-3X system, and their very good performance for both queries and updates is demonstrated by measurements of multi-user workloads with real-life data as well as stress-test synthetic loads.
Fast and accurate estimation of shortest paths in large graphs
- In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
HyPer: A hybrid OLTP&OLAP Main Memory Database System based on Virtual Memory Snapshots
- In ICDE
, 2011
"... Abstract—The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract—The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a mainmemory database system that guarantees the ACID properties of OLTP transactions and executes OLAP query sessions (multiple queries) on the same, arbitrarily current and consistent snapshot. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy on update) yields both at the same time: unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark. I.
Ultrawrap: SPARQL Execution on Relational Data
"... Abstract: The Semantic Web’s promise to achieve web-wide data integration requires the inclusion of legacy relational data as RDF, which, in turn, requires the execution of SPARQL queries on the legacy relational database. In this paper we explore a hypothesis: existing commercial relational databas ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: The Semantic Web’s promise to achieve web-wide data integration requires the inclusion of legacy relational data as RDF, which, in turn, requires the execution of SPARQL queries on the legacy relational database. In this paper we explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment, embodied in a system called Ultrawrap, comprises encoding a logical representation of the database as a graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course executing a SPARQL query, the SQL optimizer both instantiates a mapping of relational data to RDF and optimizes its execution. Other approaches typically implement aspects of query optimization and execution outside the SQL environment. Ultrawrap is evaluated using two benchmarks across the three major relational database management systems. We identify two important optimizations: detection of unsatisfiable conditions and self-join elimination, such that, when applied, SPARQL queries execute at nearly the same speed as semantically equivalent native SQL queries, providing strong evidence of the validity of the hypothesis. 1.
Semantic Web Architecture
"... The Semantic Web extends the existing Web, adding a multitude of language standards and software components to give humans and machines direct access to data. The chapter starts with deriving the architecture of the Semantic Web as a whole from first principles, followed by a presentation of Web sta ..."
Abstract
- Add to MetaCart
The Semantic Web extends the existing Web, adding a multitude of language standards and software components to give humans and machines direct access to data. The chapter starts with deriving the architecture of the Semantic Web as a whole from first principles, followed by a presentation of Web standards underpinning the Semantic Web that are used for data publishing, querying and reasoning. Further, the chapter identifies functional software components required to implement capabilities and behaviour in applications that publish and consume Semantic Web content. Contents
XML-Based RDF Data Management for Efficient Query Processing
"... The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable data repositories that can store RDF data and support efficient evaluation of SPARQL queries. Most of the existing RDF st ..."
Abstract
- Add to MetaCart
The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable data repositories that can store RDF data and support efficient evaluation of SPARQL queries. Most of the existing RDF storage techniques rely on relation model and relational database technologies for these tasks. They either keep the RDF data as triples, or decompose it into multiple relations. The mis-match between the graph model of the RDF data and the rigid 2D tables of relational model jeopardizes the scalability of such repositories and frequently renders a repository inefficient for some types of data and queries. We propose to decompose RDF graph into a forest of semantically correlated XML trees, store them in an XML repository and rewrite SPARQL queries into XPath/XQuery queries to be evaluated in the XML repository. In this paper, we discuss the basic idea of RDFto-XML decomposition and the criteria of such decomposition in term of correctness, redundancy and query efficiency, then propose two RDF-to-XML decomposition algorithms based on these criteria. Our experimental evaluation results illustrate that our approach is capable of improving both the storage efficiency and query processing efficiency compared to the existing RDF techniques.
CONTINUOUS QUERY OPTIMIZATION AND EVALUATION OVER UNIFIED LINKED STREAM DATA AND LINKED OPEN DATA
, 2010
"... Abstract. In this report we address the problem of scalable query processing over Linked Stream Data integrated with Linked Open Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data ..."
Abstract
- Add to MetaCart
Abstract. In this report we address the problem of scalable query processing over Linked Stream Data integrated with Linked Open Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data. This will enable the easy integration of sensor data with the quickly growing amount of Linked Open Data and facilitate the use of the large body of existing software along with a wide range of novel applications. However, the highly dynamic nature of sensor data requires new approaches for data management and processing which are not supported by existing systems. To remedy this, we present our Continuous Query Evaluation over Linked Streams (CQELS) approach which provides a scalable query processing model for unified Linked Stream Data and Linked Open Data. Scalability in CQELS is achieved by applying state-of-the-art techniques for efficient data storage and query pre-processing, combined with a new adaptive cost-based query optimization algorithm for dynamic data sources, such as sensor streams. In traditional Database Management Systems (DBMS), query optimizers use pre-computed selectivity values for the data to decide on the best execution plan, whereas with continuous query over stream data the data – and consequently its
Scalable SPARQL Querying of Large RDF Graphs
"... The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data ..."
Abstract
- Add to MetaCart
The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a single node, current solutions that allow the data to be partitioned across multiple machines are highly inefficient. In this paper, we introduce a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDF data management systems. In so doing, we introduce techniques for (1) leveraging state-of-the-art single node RDF-store technology (2) partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations and (3) decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.

